LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* e1000e NVM corruption issue status
       [not found] <987CEB09A2567F4A963E1E226364E2D33A685B4B@orsmsx418.amr.corp.intel.com>
@ 2008-09-26  1:50 ` Brandeburg, Jesse
  2008-09-26  1:58   ` Chris Snook
                     ` (4 more replies)
  0 siblings, 5 replies; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-26  1:50 UTC (permalink / raw)
  To: LKML
  Cc: Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W, Graham,
	David, kkiel, jesse.brandeburg, tglx, chris.jones, arjan

A quick summary of the issue, if you think you have more data, please 
reply.  If you have had this issue, please reply with results of "cat 
/proc/iomem" and "lspci".  It will help us correlate data.

Problem: some users report that with many of the latest beta distros, 
during a reboot when e1000e loads it says "NVM checksum is not valid" and 
the driver fails to load.

Result: At this point it appears that most users can load the e1000e 
driver if they skip the nvm validation error exit.  LAN traffic may or may 
not work at this point.  Some users report they can dump their eeprom 
using ethtool -e and see some varying data, most report the eeprom read 
returns all ff ff ff

NOTE: if you have not had this problem, but wish to continue using e1000e 
I strongly suggest you do a "ethtool -e eth0 > savemyeep.txt"

Many of the reports seem to be related in time to a graphics crash, no one 
has been able to give us more detail about how to reproduce.  We NEED HELP 
reproducing this.  Steps, hints, anything.  We are trying rebooting, 
suspending, opensuse, fedora, ubuntu, and several hardware platforms, etc.

This seems to effect both 32 and 64 bit kernels, but we haven't heard much 
either way.

hardware affected:
laptops and desktops with 82566 or 82567 based LAN parts, which are 
machines with the ICH8 and ICH9 chipsets and a variety of processors.
The machines I know of that have reported the issue include
Lenovo X300
HP 2510p
Intel DP35JO
Lenovo T61 (possibly)
Lenovo X61 (possibly)

Next steps:
We are still trying to reproduce the issue locally, we should have a 
machine here tomorrow that reportedly had the issue with ubuntu.

We have a series of kernel patches that I will reply to this mail with 
that may help users willing to test.

We should have ready (hopefully tomorrow) an app that should be able to 
restore eeproms as long as the driver can still load.

We also have a band-aid patch that should allow "locking" of the NVM area 
to prevent an errant write, we are looking to post that tomorrow.  This 
should prevent the damage but not really find the culprit.

Jesse

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  1:50 ` e1000e NVM corruption issue status Brandeburg, Jesse
@ 2008-09-26  1:58   ` Chris Snook
  2008-09-26  2:04     ` Brandeburg, Jesse
  2008-09-26  2:01   ` Brandeburg, Jesse
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 39+ messages in thread
From: Chris Snook @ 2008-09-26  1:58 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

Brandeburg, Jesse wrote:
> hardware affected:
> laptops and desktops with 82566 or 82567 based LAN parts, which are 
> machines with the ICH8 and ICH9 chipsets and a variety of processors.
> The machines I know of that have reported the issue include
> Lenovo X300
> HP 2510p
> Intel DP35JO
> Lenovo T61 (possibly)
> Lenovo X61 (possibly)

My Intel DG45ID board has an ICH10R chipset, and it also has an 82567LM, just as 
some of the affected systems.  Is there some reason why ICH10 is not 
susceptible, or have we simply not seen it?

-- Chris

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  1:50 ` e1000e NVM corruption issue status Brandeburg, Jesse
  2008-09-26  1:58   ` Chris Snook
@ 2008-09-26  2:01   ` Brandeburg, Jesse
  2008-09-26  2:09     ` Brandeburg, Jesse
                       ` (12 more replies)
  2008-09-26  5:44   ` Jesse Brandeburg
                     ` (2 subsequent siblings)
  4 siblings, 13 replies; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-26  2:01 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

> We have a series of kernel patches that I will reply to this mail with 
> that may help users willing to test.

this is the current set of patches that I have to help us debug
and/or fix e1000e issues found during this debug effort for
the corrupt NVM.  the "drop stats lock" - "reset swflag" patches allow 
Thomas' patch for a mutex in the SWFLAG acquire function to run without 
any errors.

the patches are not probably production quality, but seem to work
for myself and thomas on at least a couple of machines.

The non-debug aspects of the patches will likely be pushed later.

At this point I do not believe any of these patches will fix the
NVM corruption issue, but will add to the ability of any tester
to help find the issue, and reduce the chance that it is any issue we 
(now) know about.

---

Bruce Allan (2):
      e1000e: Use set_memory_ro()/set_memory_rw() to protect flash memory
      Export set_memory_ro() and set_memory_rw() calls. Soon to be used

Jesse Brandeburg (7):
      e1000e: dump eeprom to dmesg for ich8/9
      e1000e: allow bad checksum
      update version
      e1000e: drop stats lock
      e1000e: fix lockdep issues
      e1000e: do not ever sleep in interrupt context
      e1000e: reset swflag after resetting hardware

Thomas Gleixner (1):
      e1000e: debug contention on NVM SWFLAG


 arch/x86/mm/pageattr.c       |    2 +
 drivers/net/e1000e/e1000.h   |    4 +
 drivers/net/e1000e/ethtool.c |    6 +-
 drivers/net/e1000e/hw.h      |    1 
 drivers/net/e1000e/ich8lan.c |   36 ++++++++++
 drivers/net/e1000e/netdev.c  |  158 ++++++++++++++++++++++++++++--------------
 6 files changed, 153 insertions(+), 54 deletions(-)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  1:58   ` Chris Snook
@ 2008-09-26  2:04     ` Brandeburg, Jesse
  0 siblings, 0 replies; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-26  2:04 UTC (permalink / raw)
  To: Chris Snook
  Cc: Brandeburg, Jesse, LKML, Jiri Kosina, agospoda, Ronciak, John,
	Allan, Bruce W, Graham, David, kkiel, tglx, chris.jones, arjan

On Thu, 25 Sep 2008, Chris Snook wrote:

> Brandeburg, Jesse wrote:
> > hardware affected:
> > laptops and desktops with 82566 or 82567 based LAN parts, which are machines
> > with the ICH8 and ICH9 chipsets and a variety of processors.
> > The machines I know of that have reported the issue include
> > Lenovo X300
> > HP 2510p
> > Intel DP35JO
> > Lenovo T61 (possibly)
> > Lenovo X61 (possibly)
> 
> My Intel DG45ID board has an ICH10R chipset, and it also has an 82567LM, just
> as some of the affected systems.  Is there some reason why ICH10 is not
> susceptible, or have we simply not seen it?

ICH10R with 82567 is also susceptible, as far as I know at this point.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  2:01   ` Brandeburg, Jesse
@ 2008-09-26  2:09     ` Brandeburg, Jesse
  2008-09-26  7:12       ` Ingo Molnar
  2008-09-26  2:09     ` Brandeburg, Jesse
                       ` (11 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-26  2:09 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

From: Bruce Allan <bruce.w.allan@intel.com>

Export set_memory_ro() and set_memory_rw() calls. Soon to be used
by e1000e.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---

 arch/x86/mm/pageattr.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 43e2f84..0991e15 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -906,11 +906,13 @@ int set_memory_ro(unsigned long addr, int numpages)
 {
 	return change_page_attr_clear(addr, numpages, __pgprot(_PAGE_RW));
 }
+EXPORT_SYMBOL(set_memory_ro);
 
 int set_memory_rw(unsigned long addr, int numpages)
 {
 	return change_page_attr_set(addr, numpages, __pgprot(_PAGE_RW));
 }
+EXPORT_SYMBOL(set_memory_rw);
 
 int set_memory_np(unsigned long addr, int numpages)
 {

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  2:01   ` Brandeburg, Jesse
  2008-09-26  2:09     ` Brandeburg, Jesse
@ 2008-09-26  2:09     ` Brandeburg, Jesse
  2008-09-26  2:10     ` Brandeburg, Jesse
                       ` (10 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-26  2:09 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

e1000e: Use set_memory_ro()/set_memory_rw() to protect flash memory

From: Bruce Allan <bruce.w.allan@intel.com>

A number of users have reported NVM corruption on various ICHx platform
LOMs.  One possible reasons for this could be unexpected and/or malicious
writes to the flash memory area mapped into kernel memory.  Once the
interface is up, there should be very few reads/writes of the mapped flash
memory.  This patch makes use of the x86 set_memory_*() functions to set
the mapped memory read-only and temporarily set it writable only when the
driver needs to write to it.  With the memory set read-only, any unexpected
write will be logged with a stack dump indicating the offending code.

Since these LOMs are only on x86 ICHx platforms, it does not matter that
this API is not yet available on other architectures, however it is
dependent on a previous patch that exports these function name symbols.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---

 drivers/net/e1000e/e1000.h   |    1 +
 drivers/net/e1000e/hw.h      |    1 +
 drivers/net/e1000e/ich8lan.c |   16 ++++++++++++++++
 drivers/net/e1000e/netdev.c  |   11 +++++++----
 4 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index ac4e506..2786754 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -36,6 +36,7 @@
 #include <linux/workqueue.h>
 #include <linux/io.h>
 #include <linux/netdevice.h>
+#include <asm/cacheflush.h>
 
 #include "hw.h"
 
diff --git a/drivers/net/e1000e/hw.h b/drivers/net/e1000e/hw.h
index 74f263a..dd25009 100644
--- a/drivers/net/e1000e/hw.h
+++ b/drivers/net/e1000e/hw.h
@@ -863,6 +863,7 @@ struct e1000_hw {
 
 	u8 __iomem *hw_addr;
 	u8 __iomem *flash_address;
+	resource_size_t flash_len;
 
 	struct e1000_mac_info  mac;
 	struct e1000_fc_info   fc;
diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
index 9e38452..f47c60e 100644
--- a/drivers/net/e1000e/ich8lan.c
+++ b/drivers/net/e1000e/ich8lan.c
@@ -176,12 +176,28 @@ static inline u32 __er32flash(struct e1000_hw *hw, unsigned long reg)
 
 static inline void __ew16flash(struct e1000_hw *hw, unsigned long reg, u16 val)
 {
+#ifdef _ASM_X86_CACHEFLUSH_H
+	set_memory_rw((unsigned long)hw->flash_address,
+	              hw->flash_len >> PAGE_SHIFT);
+#endif
 	writew(val, hw->flash_address + reg);
+#ifdef _ASM_X86_CACHEFLUSH_H
+	set_memory_ro((unsigned long)hw->flash_address,
+	              hw->flash_len >> PAGE_SHIFT);
+#endif
 }
 
 static inline void __ew32flash(struct e1000_hw *hw, unsigned long reg, u32 val)
 {
+#ifdef _ASM_X86_CACHEFLUSH_H
+	set_memory_rw((unsigned long)hw->flash_address,
+	              hw->flash_len >> PAGE_SHIFT);
+#endif
 	writel(val, hw->flash_address + reg);
+#ifdef _ASM_X86_CACHEFLUSH_H
+	set_memory_ro((unsigned long)hw->flash_address,
+	              hw->flash_len >> PAGE_SHIFT);
+#endif
 }
 
 #define er16flash(reg)		__er16flash(hw, (reg))
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index d266510..0e51841 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -4364,7 +4364,6 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
 	struct e1000_hw *hw;
 	const struct e1000_info *ei = e1000_info_tbl[ent->driver_data];
 	resource_size_t mmio_start, mmio_len;
-	resource_size_t flash_start, flash_len;
 
 	static int cards_found;
 	int i, err, pci_using_dac;
@@ -4434,11 +4433,15 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
 
 	if ((adapter->flags & FLAG_HAS_FLASH) &&
 	    (pci_resource_flags(pdev, 1) & IORESOURCE_MEM)) {
-		flash_start = pci_resource_start(pdev, 1);
-		flash_len = pci_resource_len(pdev, 1);
-		adapter->hw.flash_address = ioremap(flash_start, flash_len);
+		adapter->hw.flash_len = pci_resource_len(pdev, 1);
+		adapter->hw.flash_address = ioremap(pci_resource_start(pdev, 1),
+		                                    adapter->hw.flash_len);
 		if (!adapter->hw.flash_address)
 			goto err_flashmap;
+#ifdef _ASM_X86_CACHEFLUSH_H
+		set_memory_ro((unsigned long)adapter->hw.flash_address,
+		              adapter->hw.flash_len >> PAGE_SHIFT);
+#endif
 	}
 
 	/* construct the net_device struct */

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  2:01   ` Brandeburg, Jesse
  2008-09-26  2:09     ` Brandeburg, Jesse
  2008-09-26  2:09     ` Brandeburg, Jesse
@ 2008-09-26  2:10     ` Brandeburg, Jesse
  2008-09-26  2:10     ` Brandeburg, Jesse
                       ` (9 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-26  2:10 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

e1000e: reset swflag after resetting hardware

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

in the process of debugging things, noticed that the swflag is not reset
by the driver after reset, and the swflag is probably not reset unless
management firmware clears it after 100ms.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---

 drivers/net/e1000e/ich8lan.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
index f47c60e..f1a6e55 100644
--- a/drivers/net/e1000e/ich8lan.c
+++ b/drivers/net/e1000e/ich8lan.c
@@ -1736,6 +1736,9 @@ static s32 e1000_reset_hw_ich8lan(struct e1000_hw *hw)
 	ew32(CTRL, (ctrl | E1000_CTRL_RST));
 	msleep(20);
 
+	/* release the swflag because it is not reset by hardware reset */
+	e1000_release_swflag_ich8lan(hw);
+
 	ret_val = e1000e_get_auto_rd_done(hw);
 	if (ret_val) {
 		/*

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  2:01   ` Brandeburg, Jesse
                       ` (2 preceding siblings ...)
  2008-09-26  2:10     ` Brandeburg, Jesse
@ 2008-09-26  2:10     ` Brandeburg, Jesse
  2008-09-26  2:10     ` Brandeburg, Jesse
                       ` (8 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-26  2:10 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

e1000e: do not ever sleep in interrupt context

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

e1000e was apparently calling two functions that attempted to reserve
the SWFLAG bit for exclusive (to hardware and firmware) access to
the PHY and NVM (aka eeprom).  These accesses could possibly call
msleep to wait for the resource which is not allowed from interrupt
context.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Thomas Gleixner <tglx@linutronix.de>
---

 drivers/net/e1000e/e1000.h  |    2 ++
 drivers/net/e1000e/netdev.c |   31 ++++++++++++++++++++++++++++---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index 2786754..951080f 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -285,6 +285,8 @@ struct e1000_adapter {
 	unsigned long led_status;
 
 	unsigned int flags;
+	struct work_struct downshift_task;
+	struct work_struct update_phy_task;
 };
 
 struct e1000_info {
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 0e51841..1756be4 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -1115,6 +1115,14 @@ static void e1000_clean_rx_ring(struct e1000_adapter *adapter)
 	writel(0, adapter->hw.hw_addr + rx_ring->tail);
 }
 
+static void e1000e_downshift_workaround(struct work_struct *work)
+{
+	struct e1000_adapter *adapter = container_of(work,
+					struct e1000_adapter, downshift_task);
+
+	e1000e_gig_downshift_workaround_ich8lan(&adapter->hw);
+}
+
 /**
  * e1000_intr_msi - Interrupt Handler
  * @irq: interrupt number
@@ -1139,7 +1147,7 @@ static irqreturn_t e1000_intr_msi(int irq, void *data)
 		 */
 		if ((adapter->flags & FLAG_LSC_GIG_SPEED_DROP) &&
 		    (!(er32(STATUS) & E1000_STATUS_LU)))
-			e1000e_gig_downshift_workaround_ich8lan(hw);
+			schedule_work(&adapter->downshift_task);
 
 		/*
 		 * 80003ES2LAN workaround-- For packet buffer work-around on
@@ -1205,7 +1213,7 @@ static irqreturn_t e1000_intr(int irq, void *data)
 		 */
 		if ((adapter->flags & FLAG_LSC_GIG_SPEED_DROP) &&
 		    (!(er32(STATUS) & E1000_STATUS_LU)))
-			e1000e_gig_downshift_workaround_ich8lan(hw);
+			schedule_work(&adapter->downshift_task);
 
 		/*
 		 * 80003ES2LAN workaround--
@@ -2912,6 +2920,21 @@ static int e1000_set_mac(struct net_device *netdev, void *p)
 	return 0;
 }
 
+/**
+ * e1000e_update_phy_task - work thread to update phy
+ * @work: pointer to our work struct
+ *
+ * this worker thread exists because we must acquire a
+ * semaphore to read the phy, which we could msleep while
+ * waiting for it, and we can't msleep in a timer.
+ **/
+static void e1000e_update_phy_task(struct work_struct *work)
+{
+	struct e1000_adapter *adapter = container_of(work,
+					struct e1000_adapter, update_phy_task);
+	e1000_get_phy_info(&adapter->hw);
+}
+
 /*
  * Need to wait a few seconds after link up to get diagnostic information from
  * the phy
@@ -2919,7 +2942,7 @@ static int e1000_set_mac(struct net_device *netdev, void *p)
 static void e1000_update_phy_info(unsigned long data)
 {
 	struct e1000_adapter *adapter = (struct e1000_adapter *) data;
-	e1000_get_phy_info(&adapter->hw);
+	schedule_work(&adapter->update_phy_task);
 }
 
 /**
@@ -4575,6 +4598,8 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
 
 	INIT_WORK(&adapter->reset_task, e1000_reset_task);
 	INIT_WORK(&adapter->watchdog_task, e1000_watchdog_task);
+	INIT_WORK(&adapter->downshift_task, e1000e_downshift_workaround);
+	INIT_WORK(&adapter->update_phy_task, e1000e_update_phy_task);
 
 	e1000e_check_options(adapter);
 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  2:01   ` Brandeburg, Jesse
                       ` (3 preceding siblings ...)
  2008-09-26  2:10     ` Brandeburg, Jesse
@ 2008-09-26  2:10     ` Brandeburg, Jesse
  2008-09-26  2:11     ` Brandeburg, Jesse
                       ` (7 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-26  2:10 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

e1000e: fix lockdep issues

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

thanks to tglx, we're finding some interesting lockdep issues.
The good news is that this patch fixes all the ones I
could find, without damaging any functionality.

CC: Thomas Gleixner <tglx@linutronix.de>

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---

 drivers/net/e1000e/ethtool.c |    6 +++++-
 drivers/net/e1000e/netdev.c  |   13 -------------
 2 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/net/e1000e/ethtool.c b/drivers/net/e1000e/ethtool.c
index e21c9e0..f3b49f6 100644
--- a/drivers/net/e1000e/ethtool.c
+++ b/drivers/net/e1000e/ethtool.c
@@ -432,6 +432,10 @@ static void e1000_get_regs(struct net_device *netdev,
 	regs_buff[11] = er32(TIDV);
 
 	regs_buff[12] = adapter->hw.phy.type;  /* PHY type (IGP=1, M88=0) */
+
+	/* ethtool doesn't use anything past this point, so all this
+	 * code is likely legacy junk for apps that may or may not
+	 * exist */
 	if (hw->phy.type == e1000_phy_m88) {
 		e1e_rphy(hw, M88E1000_PHY_SPEC_STATUS, &phy_data);
 		regs_buff[13] = (u32)phy_data; /* cable length */
@@ -447,7 +451,7 @@ static void e1000_get_regs(struct net_device *netdev,
 		regs_buff[22] = adapter->phy_stats.receive_errors;
 		regs_buff[23] = regs_buff[13]; /* mdix mode */
 	}
-	regs_buff[21] = adapter->phy_stats.idle_errors;  /* phy idle errors */
+	regs_buff[21] = 0; /* was idle_errors */
 	e1e_rphy(hw, PHY_1000T_STATUS, &phy_data);
 	regs_buff[24] = (u32)phy_data;  /* phy local receiver status */
 	regs_buff[25] = regs_buff[24];  /* phy remote receiver status */
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 1756be4..235c014 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -2954,9 +2954,6 @@ void e1000e_update_stats(struct e1000_adapter *adapter)
 	struct e1000_hw *hw = &adapter->hw;
 	struct pci_dev *pdev = adapter->pdev;
 	unsigned long irq_flags;
-	u16 phy_tmp;
-
-#define PHY_IDLE_ERROR_COUNT_MASK 0x00FF
 
 	/*
 	 * Prevent stats update while adapter is being reset, or if the pci
@@ -3045,15 +3042,6 @@ void e1000e_update_stats(struct e1000_adapter *adapter)
 
 	/* Tx Dropped needs to be maintained elsewhere */
 
-	/* Phy Stats */
-	if (hw->phy.media_type == e1000_media_type_copper) {
-		if ((adapter->link_speed == SPEED_1000) &&
-		   (!e1e_rphy(hw, PHY_1000T_STATUS, &phy_tmp))) {
-			phy_tmp &= PHY_IDLE_ERROR_COUNT_MASK;
-			adapter->phy_stats.idle_errors += phy_tmp;
-		}
-	}
-
 	/* Management Stats */
 	adapter->stats.mgptc += er32(MGTPTC);
 	adapter->stats.mgprc += er32(MGTPRC);
@@ -3073,7 +3061,6 @@ static void e1000_phy_read_status(struct e1000_adapter *adapter)
 	int ret_val;
 	unsigned long irq_flags;
 
-
 	spin_lock_irqsave(&adapter->stats_lock, irq_flags);
 
 	if ((er32(STATUS) & E1000_STATUS_LU) &&

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  2:01   ` Brandeburg, Jesse
                       ` (4 preceding siblings ...)
  2008-09-26  2:10     ` Brandeburg, Jesse
@ 2008-09-26  2:11     ` Brandeburg, Jesse
  2008-09-26  2:11     ` Brandeburg, Jesse
                       ` (6 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-26  2:11 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

e1000e: drop stats lock

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

the stats lock is left over from e1000, e1000e no longer
has the adjust tbi stats function that required the addition
of the stats lock to begin with.

adding a mutex to acquire_swflag helped catch this one too.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Thomas Gleixner <tglx@linutronix.de>
---

 drivers/net/e1000e/e1000.h  |    1 -
 drivers/net/e1000e/netdev.c |   18 ------------------
 2 files changed, 0 insertions(+), 19 deletions(-)

diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index 951080f..2a3a311 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -258,7 +258,6 @@ struct e1000_adapter {
 	struct net_device *netdev;
 	struct pci_dev *pdev;
 	struct net_device_stats net_stats;
-	spinlock_t stats_lock;      /* prevent concurrent stats updates */
 
 	/* structs defined in e1000_hw.h */
 	struct e1000_hw hw;
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 235c014..bd7fa13 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -2600,8 +2600,6 @@ static int __devinit e1000_sw_init(struct e1000_adapter *adapter)
 	/* Explicitly disable IRQ since the NIC can be in any state. */
 	e1000_irq_disable(adapter);
 
-	spin_lock_init(&adapter->stats_lock);
-
 	set_bit(__E1000_DOWN, &adapter->state);
 	return 0;
 
@@ -2953,7 +2951,6 @@ void e1000e_update_stats(struct e1000_adapter *adapter)
 {
 	struct e1000_hw *hw = &adapter->hw;
 	struct pci_dev *pdev = adapter->pdev;
-	unsigned long irq_flags;
 
 	/*
 	 * Prevent stats update while adapter is being reset, or if the pci
@@ -2964,14 +2961,6 @@ void e1000e_update_stats(struct e1000_adapter *adapter)
 	if (pci_channel_offline(pdev))
 		return;
 
-	spin_lock_irqsave(&adapter->stats_lock, irq_flags);
-
-	/*
-	 * these counters are modified from e1000_adjust_tbi_stats,
-	 * called from the interrupt context, so they must only
-	 * be written while holding adapter->stats_lock
-	 */
-
 	adapter->stats.crcerrs += er32(CRCERRS);
 	adapter->stats.gprc += er32(GPRC);
 	adapter->stats.gorc += er32(GORCL);
@@ -3046,8 +3035,6 @@ void e1000e_update_stats(struct e1000_adapter *adapter)
 	adapter->stats.mgptc += er32(MGTPTC);
 	adapter->stats.mgprc += er32(MGTPRC);
 	adapter->stats.mgpdc += er32(MGTPDC);
-
-	spin_unlock_irqrestore(&adapter->stats_lock, irq_flags);
 }
 
 /**
@@ -3059,9 +3046,6 @@ static void e1000_phy_read_status(struct e1000_adapter *adapter)
 	struct e1000_hw *hw = &adapter->hw;
 	struct e1000_phy_regs *phy = &adapter->phy_regs;
 	int ret_val;
-	unsigned long irq_flags;
-
-	spin_lock_irqsave(&adapter->stats_lock, irq_flags);
 
 	if ((er32(STATUS) & E1000_STATUS_LU) &&
 	    (adapter->hw.phy.media_type == e1000_media_type_copper)) {
@@ -3092,8 +3076,6 @@ static void e1000_phy_read_status(struct e1000_adapter *adapter)
 		phy->stat1000 = 0;
 		phy->estatus = (ESTATUS_1000_TFULL | ESTATUS_1000_THALF);
 	}
-
-	spin_unlock_irqrestore(&adapter->stats_lock, irq_flags);
 }
 
 static void e1000_print_link_info(struct e1000_adapter *adapter)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  2:01   ` Brandeburg, Jesse
                       ` (5 preceding siblings ...)
  2008-09-26  2:11     ` Brandeburg, Jesse
@ 2008-09-26  2:11     ` Brandeburg, Jesse
  2008-09-26  2:12     ` Brandeburg, Jesse
                       ` (5 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-26  2:11 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

e1000e: debug contention on NVM SWFLAG

From: Thomas Gleixner <tglx@linutronix.de>

This patch adds a mutex to the e1000e driver that would help
catch any collisions of two e1000e threads accessing hardware
at the same time.

description and patch updated by Jesse

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---

 drivers/net/e1000e/ich8lan.c |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
index f1a6e55..2b1aa2a 100644
--- a/drivers/net/e1000e/ich8lan.c
+++ b/drivers/net/e1000e/ich8lan.c
@@ -382,6 +382,9 @@ static s32 e1000_get_variants_ich8lan(struct e1000_adapter *adapter)
 	return 0;
 }
 
+static DEFINE_MUTEX(nvm_mutex);
+static pid_t nvm_owner = -1;
+
 /**
  *  e1000_acquire_swflag_ich8lan - Acquire software control flag
  *  @hw: pointer to the HW structure
@@ -395,6 +398,15 @@ static s32 e1000_acquire_swflag_ich8lan(struct e1000_hw *hw)
 	u32 extcnf_ctrl;
 	u32 timeout = PHY_CFG_TIMEOUT;
 
+	WARN_ON(preempt_count());
+
+	if (!mutex_trylock(&nvm_mutex)) {
+		WARN(1, KERN_ERR "e1000e mutex contention. Owned by pid %d\n",
+		     nvm_owner);
+		mutex_lock(&nvm_mutex);
+	}
+	nvm_owner = current->pid;
+
 	while (timeout) {
 		extcnf_ctrl = er32(EXTCNF_CTRL);
 		extcnf_ctrl |= E1000_EXTCNF_CTRL_SWFLAG;
@@ -409,6 +421,8 @@ static s32 e1000_acquire_swflag_ich8lan(struct e1000_hw *hw)
 
 	if (!timeout) {
 		hw_dbg(hw, "FW or HW has locked the resource for too long.\n");
+		nvm_owner = -1;
+		mutex_unlock(&nvm_mutex);
 		return -E1000_ERR_CONFIG;
 	}
 
@@ -430,6 +444,9 @@ static void e1000_release_swflag_ich8lan(struct e1000_hw *hw)
 	extcnf_ctrl = er32(EXTCNF_CTRL);
 	extcnf_ctrl &= ~E1000_EXTCNF_CTRL_SWFLAG;
 	ew32(EXTCNF_CTRL, extcnf_ctrl);
+
+	nvm_owner = -1;
+	mutex_unlock(&nvm_mutex);
 }
 
 /**

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  2:01   ` Brandeburg, Jesse
                       ` (6 preceding siblings ...)
  2008-09-26  2:11     ` Brandeburg, Jesse
@ 2008-09-26  2:12     ` Brandeburg, Jesse
  2008-09-26  2:12     ` Brandeburg, Jesse
                       ` (4 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-26  2:12 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

update version

From: Jesse Brandeburg <jesse.brandeburg@intel.com>


---

 drivers/net/e1000e/netdev.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index bd7fa13..89ca272 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -47,7 +47,7 @@
 
 #include "e1000.h"
 
-#define DRV_VERSION "0.3.3.3-k2"
+#define DRV_VERSION "0.3.3.3-kt"
 char e1000e_driver_name[] = "e1000e";
 const char e1000e_driver_version[] = DRV_VERSION;
 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  2:01   ` Brandeburg, Jesse
                       ` (7 preceding siblings ...)
  2008-09-26  2:12     ` Brandeburg, Jesse
@ 2008-09-26  2:12     ` Brandeburg, Jesse
  2008-09-26  2:13     ` Brandeburg, Jesse
                       ` (3 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-26  2:12 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

e1000e: allow bad checksum

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

currently if the driver notices a bad checksum it will fail to
load.  This patch allows the driver load process to continue with
an invalid mac address and could allow the user to use ethtool or
another app to fix the eeprom.

copied from implementation in e1000

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---

 drivers/net/e1000e/netdev.c |   80 +++++++++++++++++++++++++++++++++++--------
 1 files changed, 66 insertions(+), 14 deletions(-)

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 89ca272..ad026d0 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -4338,6 +4338,52 @@ static void e1000_eeprom_checks(struct e1000_adapter *adapter)
 }
 
 /**
+ * e1000e_dump_eeprom - write the eeprom to kernel log
+ * @adapter: our adapter struct
+ *
+ * Dump the eeprom for users having checksum issues
+ **/
+static void e1000e_dump_eeprom(struct e1000_adapter *adapter)
+{
+	struct net_device *netdev = adapter->netdev;
+	struct ethtool_eeprom eeprom;
+	const struct ethtool_ops *ops = netdev->ethtool_ops;
+	u8 *data;
+	int i;
+	u16 csum_old, csum_new = 0;
+
+	eeprom.len = ops->get_eeprom_len(netdev);
+	eeprom.offset = 0;
+
+	data = kzalloc(eeprom.len, GFP_KERNEL);
+	if (!data) {
+		printk(KERN_ERR "Unable to allocate memory to dump EEPROM"
+		       " data\n");
+		return;
+	}
+
+	ops->get_eeprom(netdev, &eeprom, data);
+
+	csum_old = (data[NVM_CHECKSUM_REG * 2]) +
+		   (data[NVM_CHECKSUM_REG * 2 + 1] << 8);
+	for (i = 0; i < NVM_CHECKSUM_REG * 2; i += 2)
+		csum_new += data[i] + (data[i + 1] << 8);
+	csum_new = NVM_SUM - csum_new;
+
+	printk(KERN_ERR "/*********************/\n");
+	printk(KERN_ERR "Current EEPROM Checksum : 0x%04x\n", csum_old);
+	printk(KERN_ERR "Calculated              : 0x%04x\n", csum_new);
+
+	printk(KERN_ERR "Offset    Values\n");
+	printk(KERN_ERR "========  ======\n");
+	print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET, 16, 1, data, 128, 0);
+
+	printk(KERN_ERR "/*********************/\n");
+
+	kfree(data);
+}
+
+/**
  * e1000_probe - Device Initialization Routine
  * @pdev: PCI device information struct
  * @ent: entry in e1000_pci_tbl
@@ -4530,31 +4576,38 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
 	 * attempt. Let's give it a few tries
 	 */
 	for (i = 0;; i++) {
-		if (e1000_validate_nvm_checksum(&adapter->hw) >= 0)
+		if (e1000_validate_nvm_checksum(hw) >= 0) {
+			/* copy the MAC address out of the NVM */
+			if (e1000e_read_mac_addr(&adapter->hw))
+				e_err("NVM Read Error reading MAC address\n");
 			break;
+		}
 		if (i == 2) {
 			e_err("The NVM Checksum Is Not Valid\n");
-			err = -EIO;
-			goto err_eeprom;
+			e1000e_dump_eeprom(adapter);
+			/*
+			 * set MAC address to all zeroes to invalidate and
+			 * temporary disable this device for the user. This
+			 * blocks regular traffic while still permitting
+			 * ethtool ioctls from reaching the hardware as well as
+			 * allowing the user to run the interface after
+			 * manually setting a hw addr using
+			 * `ip link set address`
+			 */
+			memset(hw->mac.addr, 0, netdev->addr_len);
 		}
 	}
 
 	e1000_eeprom_checks(adapter);
 
-	/* copy the MAC address out of the NVM */
-	if (e1000e_read_mac_addr(&adapter->hw))
-		e_err("NVM Read Error while reading MAC address\n");
-
+	/* don't block initalization here due to bad MAC address */
 	memcpy(netdev->dev_addr, adapter->hw.mac.addr, netdev->addr_len);
 	memcpy(netdev->perm_addr, adapter->hw.mac.addr, netdev->addr_len);
 
 	if (!is_valid_ether_addr(netdev->perm_addr)) {
-		e_err("Invalid MAC Address: %02x:%02x:%02x:%02x:%02x:%02x\n",
-		      netdev->perm_addr[0], netdev->perm_addr[1],
-		      netdev->perm_addr[2], netdev->perm_addr[3],
-		      netdev->perm_addr[4], netdev->perm_addr[5]);
-		err = -EIO;
-		goto err_eeprom;
+		DECLARE_MAC_BUF(mac);
+		e_err("Invalid MAC Address: %s\n",
+		      print_mac(mac, netdev->perm_addr));
 	}
 
 	init_timer(&adapter->watchdog_timer);
@@ -4643,7 +4696,6 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
 err_register:
 	if (!(adapter->flags & FLAG_HAS_AMT))
 		e1000_release_hw_control(adapter);
-err_eeprom:
 	if (!e1000_check_reset_block(&adapter->hw))
 		e1000_phy_hw_reset(&adapter->hw);
 err_hw_init:

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  2:01   ` Brandeburg, Jesse
                       ` (8 preceding siblings ...)
  2008-09-26  2:12     ` Brandeburg, Jesse
@ 2008-09-26  2:13     ` Brandeburg, Jesse
  2008-09-26  2:13     ` Brandeburg, Jesse
                       ` (2 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-26  2:13 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

e1000e: dump eeprom to dmesg for ich8/9

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

dumping the eeprom for now seems like a bit of a verbose
hack, but might be useful when we want to restore it.

if syslogd (or something like) isn't running it won't be kept
however.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---

 drivers/net/e1000e/netdev.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index ad026d0..c5a99ed 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -4600,6 +4600,11 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
 
 	e1000_eeprom_checks(adapter);
 
+	/* debug code ... dump the first bytes of the eeprom for
+	 * ich parts that might get a corruption */
+	if (adapter->flags & FLAG_IS_ICH)
+		e1000e_dump_eeprom(adapter);
+
 	/* don't block initalization here due to bad MAC address */
 	memcpy(netdev->dev_addr, adapter->hw.mac.addr, netdev->addr_len);
 	memcpy(netdev->perm_addr, adapter->hw.mac.addr, netdev->addr_len);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  2:01   ` Brandeburg, Jesse
                       ` (9 preceding siblings ...)
  2008-09-26  2:13     ` Brandeburg, Jesse
@ 2008-09-26  2:13     ` Brandeburg, Jesse
  2008-09-29 15:52       ` Jiri Kosina
  2008-09-26  6:13     ` Jiri Kosina
  2008-09-26 14:23     ` Karsten Keil
  12 siblings, 1 reply; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-26  2:13 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

in case your mailer hoses something apply in this order:
# This series applies on GIT commit 011fcfcb75311c7368f13170b9e68adcf146a557
01-e-mem.patch
02-e_flash.patch
03-e1000e-release-lock-in-reset.patch
04-e1000e-dont-sleep.patch
05-e1000e-no-deeplocks.patch
06-e1000e-drop-stats-lock.patch
07-subject-e1000e-debug-patch.patch
08-e1000e-version.patch
09-e1000e-allow-bad-checksum.patch
10-e1000e-dump-eeprom-to-dmesg.txt

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  1:50 ` e1000e NVM corruption issue status Brandeburg, Jesse
  2008-09-26  1:58   ` Chris Snook
  2008-09-26  2:01   ` Brandeburg, Jesse
@ 2008-09-26  5:44   ` Jesse Brandeburg
  2008-09-26  7:19   ` Karsten Keil
  2008-10-18 19:13   ` James Courtier-Dutton
  4 siblings, 0 replies; 39+ messages in thread
From: Jesse Brandeburg @ 2008-09-26  5:44 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan, NetDEV list

I'm apparently in too much of a hurry, including netdev...

On Thu, Sep 25, 2008 at 6:50 PM, Brandeburg, Jesse
<jesse.brandeburg@intel.com> wrote:
> A quick summary of the issue, if you think you have more data, please
> reply.  If you have had this issue, please reply with results of "cat
> /proc/iomem" and "lspci".  It will help us correlate data.
>
> Problem: some users report that with many of the latest beta distros,
> during a reboot when e1000e loads it says "NVM checksum is not valid" and
> the driver fails to load.
>
> Result: At this point it appears that most users can load the e1000e
> driver if they skip the nvm validation error exit.  LAN traffic may or may
> not work at this point.  Some users report they can dump their eeprom
> using ethtool -e and see some varying data, most report the eeprom read
> returns all ff ff ff
>
> NOTE: if you have not had this problem, but wish to continue using e1000e
> I strongly suggest you do a "ethtool -e eth0 > savemyeep.txt"
>
> Many of the reports seem to be related in time to a graphics crash, no one
> has been able to give us more detail about how to reproduce.  We NEED HELP
> reproducing this.  Steps, hints, anything.  We are trying rebooting,
> suspending, opensuse, fedora, ubuntu, and several hardware platforms, etc.
>
> This seems to effect both 32 and 64 bit kernels, but we haven't heard much
> either way.
>
> hardware affected:
> laptops and desktops with 82566 or 82567 based LAN parts, which are
> machines with the ICH8 and ICH9 chipsets and a variety of processors.
> The machines I know of that have reported the issue include
> Lenovo X300
> HP 2510p
> Intel DP35JO
> Lenovo T61 (possibly)
> Lenovo X61 (possibly)
>
> Next steps:
> We are still trying to reproduce the issue locally, we should have a
> machine here tomorrow that reportedly had the issue with ubuntu.
>
> We have a series of kernel patches that I will reply to this mail with
> that may help users willing to test.
>
> We should have ready (hopefully tomorrow) an app that should be able to
> restore eeproms as long as the driver can still load.
>
> We also have a band-aid patch that should allow "locking" of the NVM area
> to prevent an errant write, we are looking to post that tomorrow.  This
> should prevent the damage but not really find the culprit.
>
> Jesse
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  2:01   ` Brandeburg, Jesse
                       ` (10 preceding siblings ...)
  2008-09-26  2:13     ` Brandeburg, Jesse
@ 2008-09-26  6:13     ` Jiri Kosina
  2008-09-26 11:49       ` Arjan van de Ven
  2008-09-26 14:23     ` Karsten Keil
  12 siblings, 1 reply; 39+ messages in thread
From: Jiri Kosina @ 2008-09-26  6:13 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, agospoda, Ronciak, John, Allan, Bruce W, Graham, David,
	kkiel, tglx, chris.jones, arjan

On Thu, 25 Sep 2008, Brandeburg, Jesse wrote:

> this is the current set of patches that I have to help us debug
> and/or fix e1000e issues found during this debug effort for
> the corrupt NVM.  the "drop stats lock" - "reset swflag" patches allow 
> Thomas' patch for a mutex in the SWFLAG acquire function to run without 
> any errors.

Thanks. Also Jesse Barnes' patch shouldn't be forgotten, could you please 
add it to that lineup?

	http://marc.info/?l=linux-kernel&m=122237193628087&w=2

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  2:09     ` Brandeburg, Jesse
@ 2008-09-26  7:12       ` Ingo Molnar
  0 siblings, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2008-09-26  7:12 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan, H. Peter Anvin


* Brandeburg, Jesse <jesse.brandeburg@intel.com> wrote:

> From: Bruce Allan <bruce.w.allan@intel.com>
> 
> Export set_memory_ro() and set_memory_rw() calls. Soon to be used
> by e1000e.
> 
> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> ---
> 
>  arch/x86/mm/pageattr.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 43e2f84..0991e15 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -906,11 +906,13 @@ int set_memory_ro(unsigned long addr, int numpages)
>  {
>  	return change_page_attr_clear(addr, numpages, __pgprot(_PAGE_RW));
>  }
> +EXPORT_SYMBOL(set_memory_ro);
>  
>  int set_memory_rw(unsigned long addr, int numpages)
>  {
>  	return change_page_attr_set(addr, numpages, __pgprot(_PAGE_RW));
>  }
> +EXPORT_SYMBOL(set_memory_rw);

that's fine, as long as you make it kernel-internal EXPORT_SYMBOL_GPL():

  Acked-by: Ingo Molnar <mingo@elte.hu>

feel free to push that bit via the networking tree(s) whenever you think 
you'd like to push it. We can queue it up in the x86 tree too - it's a 
useful debug facility for critical resources.

one other possible angle beyond these current theories of user-space PCI 
BAR corruption (perhaps) and racy in-kernel corruption (less likely) is 
PAT and conflicting caching attributes.

But that too is in the race category IMO (while this corruption seems to 
trigger straight away on the affected boxes) and the CPUs that saw these 
corruptions should triple fault if the OS creates conflicting cache 
attributes.

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  1:50 ` e1000e NVM corruption issue status Brandeburg, Jesse
                     ` (2 preceding siblings ...)
  2008-09-26  5:44   ` Jesse Brandeburg
@ 2008-09-26  7:19   ` Karsten Keil
  2008-10-18 19:13   ` James Courtier-Dutton
  4 siblings, 0 replies; 39+ messages in thread
From: Karsten Keil @ 2008-09-26  7:19 UTC (permalink / raw)
  To: LKML

On Thu, Sep 25, 2008 at 06:50:57PM -0700, Brandeburg, Jesse wrote:
> hardware affected:
> laptops and desktops with 82566 or 82567 based LAN parts, which are 
> machines with the ICH8 and ICH9 chipsets and a variety of processors.
> The machines I know of that have reported the issue include
> Lenovo X300
  We have  also a  R400 showing this issue
> HP 2510p
> Intel DP35JO
> Lenovo T61 (possibly)
> Lenovo X61 (possibly)

Lenovo X61 is verified (ethtool -e shows all FF)

-- 
Karsten Keil
SuSE Labs
SUSE LINUX Products GmbH, Maxfeldstr.5 90409 Nuernberg, GF: Markus Rex, HRB 16746 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  6:13     ` Jiri Kosina
@ 2008-09-26 11:49       ` Arjan van de Ven
  2008-09-26 17:52         ` Jesse Barnes
  0 siblings, 1 reply; 39+ messages in thread
From: Arjan van de Ven @ 2008-09-26 11:49 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Brandeburg, Jesse, LKML, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

Jiri Kosina wrote:
> On Thu, 25 Sep 2008, Brandeburg, Jesse wrote:
> 
>> this is the current set of patches that I have to help us debug
>> and/or fix e1000e issues found during this debug effort for
>> the corrupt NVM.  the "drop stats lock" - "reset swflag" patches allow 
>> Thomas' patch for a mutex in the SWFLAG acquire function to run without 
>> any errors.
> 
> Thanks. Also Jesse Barnes' patch shouldn't be forgotten, could you please 
> add it to that lineup?
> 
> 	http://marc.info/?l=linux-kernel&m=122237193628087&w=2
> 

can we (for now) also stick a WARN_ON() into that failure path? that way we can at least
catch if/when this happens more visibly..... if it happens consistently in say the new distros
we can be more confident that we're down the right path in diagnosing the issue.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  2:01   ` Brandeburg, Jesse
                       ` (11 preceding siblings ...)
  2008-09-26  6:13     ` Jiri Kosina
@ 2008-09-26 14:23     ` Karsten Keil
  12 siblings, 0 replies; 39+ messages in thread
From: Karsten Keil @ 2008-09-26 14:23 UTC (permalink / raw)
  To: LKML

On Thu, Sep 25, 2008 at 07:01:21PM -0700, Brandeburg, Jesse wrote:
> > We have a series of kernel patches that I will reply to this mail with 
> > that may help users willing to test.
> 
> this is the current set of patches that I have to help us debug
> and/or fix e1000e issues found during this debug effort for
> the corrupt NVM.  the "drop stats lock" - "reset swflag" patches allow 
> Thomas' patch for a mutex in the SWFLAG acquire function to run without 
> any errors.
> 
> the patches are not probably production quality, but seem to work
> for myself and thomas on at least a couple of machines.
> 
> The non-debug aspects of the patches will likely be pushed later.
> 
> At this point I do not believe any of these patches will fix the
> NVM corruption issue, but will add to the ability of any tester
> to help find the issue, and reduce the chance that it is any issue we 
> (now) know about.
> 

A kernel with this these patches load the e1000e driver fine on
a test machine which has a OK NVM checksum.

But it freeze on one machine which has a wrong checksum.

On this machine the NVM seems to be OK, but the NVM valid bit
is cleared. If I would set this bit, the NVM checksum would be OK
again. This is the T61 notebook which did show this error after
openSUSE 11.1 Beta1 install after a reboot during X setup. 


-- 
Karsten Keil
SuSE Labs
ISDN and VOIP development
SUSE LINUX Products GmbH, Maxfeldstr.5 90409 Nuernberg, GF: Markus Rex, HRB 16746 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26 11:49       ` Arjan van de Ven
@ 2008-09-26 17:52         ` Jesse Barnes
  2008-09-26 18:23           ` Jesse Barnes
  0 siblings, 1 reply; 39+ messages in thread
From: Jesse Barnes @ 2008-09-26 17:52 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Jiri Kosina, Brandeburg, Jesse, LKML, agospoda, Ronciak, John,
	Allan, Bruce W, Graham, David, kkiel, tglx, chris.jones, arjan

On Friday, September 26, 2008 4:49 am Arjan van de Ven wrote:
> Jiri Kosina wrote:
> > On Thu, 25 Sep 2008, Brandeburg, Jesse wrote:
> >> this is the current set of patches that I have to help us debug
> >> and/or fix e1000e issues found during this debug effort for
> >> the corrupt NVM.  the "drop stats lock" - "reset swflag" patches allow
> >> Thomas' patch for a mutex in the SWFLAG acquire function to run without
> >> any errors.
> >
> > Thanks. Also Jesse Barnes' patch shouldn't be forgotten, could you please
> > add it to that lineup?
> >
> > 	http://marc.info/?l=linux-kernel&m=122237193628087&w=2
>
> can we (for now) also stick a WARN_ON() into that failure path? that way we
> can at least catch if/when this happens more visibly..... if it happens
> consistently in say the new distros we can be more confident that we're
> down the right path in diagnosing the issue.

I'm spinning a new one now with some debug output, stay tuned (just gotta boot 
my test box).

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26 17:52         ` Jesse Barnes
@ 2008-09-26 18:23           ` Jesse Barnes
  2008-09-26 18:39             ` Jesse Barnes
  2008-09-26 18:53             ` Tim Gardner
  0 siblings, 2 replies; 39+ messages in thread
From: Jesse Barnes @ 2008-09-26 18:23 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Jiri Kosina, Brandeburg, Jesse, LKML, agospoda, Ronciak, John,
	Allan, Bruce W, Graham, David, kkiel, tglx, chris.jones, arjan

[-- Attachment #1: Type: text/plain, Size: 1264 bytes --]

On Friday, September 26, 2008 10:52 am Jesse Barnes wrote:
> On Friday, September 26, 2008 4:49 am Arjan van de Ven wrote:
> > Jiri Kosina wrote:
> > > On Thu, 25 Sep 2008, Brandeburg, Jesse wrote:
> > >> this is the current set of patches that I have to help us debug
> > >> and/or fix e1000e issues found during this debug effort for
> > >> the corrupt NVM.  the "drop stats lock" - "reset swflag" patches allow
> > >> Thomas' patch for a mutex in the SWFLAG acquire function to run
> > >> without any errors.
> > >
> > > Thanks. Also Jesse Barnes' patch shouldn't be forgotten, could you
> > > please add it to that lineup?
> > >
> > > 	http://marc.info/?l=linux-kernel&m=122237193628087&w=2
> >
> > can we (for now) also stick a WARN_ON() into that failure path? that way
> > we can at least catch if/when this happens more visibly..... if it
> > happens consistently in say the new distros we can be more confident that
> > we're down the right path in diagnosing the issue.
>
> I'm spinning a new one now with some debug output, stay tuned (just gotta
> boot my test box).

Ok here's an updated one.  Jesse (Br) can you add it to your list?  If the X 
driver really is mapping too much this should catch it, as long as it goes 
through sysfs.

Thanks,
Jesse

[-- Attachment #2: pci-sysfs-mmap-range-check-2.patch --]
[-- Type: text/x-diff, Size: 1427 bytes --]

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 9c71858..11523a3 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -16,6 +16,7 @@
 
 
 #include <linux/kernel.h>
+#include <linux/sched.h>
 #include <linux/pci.h>
 #include <linux/stat.h>
 #include <linux/topology.h>
@@ -502,6 +503,8 @@ pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
 	struct resource *res = (struct resource *)attr->private;
 	enum pci_mmap_state mmap_type;
 	resource_size_t start, end;
+	unsigned long map_len = vma->vm_end - vma->vm_start;
+	unsigned long map_offset = vma->vm_pgoff << PAGE_SHIFT;
 	int i;
 
 	for (i = 0; i < PCI_ROM_RESOURCE; i++)
@@ -510,6 +513,18 @@ pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
 	if (i >= PCI_ROM_RESOURCE)
 		return -ENODEV;
 
+	/*
+	 * Make sure the range the user is trying to map falls within
+	 * the resource
+	 */
+	if (map_offset + map_len > pci_resource_len(pdev, i)) {
+		printk(KERN_ERR "process \"%s\" tried to map 0x%08lx-0x%08lx on BAR %d (size 0x%08lx)\n",
+		       current->comm, map_offset, map_offset + map_len, i,
+		       (unsigned long)pci_resource_len(pdev, i));
+		WARN_ON(1);
+		return -EINVAL;
+	}
+
 	/* pci_mmap_page_range() expects the same kind of entry as coming
 	 * from /proc/bus/pci/ which is a "user visible" value. If this is
 	 * different from the resource itself, arch will do necessary fixup.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26 18:23           ` Jesse Barnes
@ 2008-09-26 18:39             ` Jesse Barnes
  2008-09-26 18:43               ` Jesse Barnes
  2008-09-26 18:53             ` Tim Gardner
  1 sibling, 1 reply; 39+ messages in thread
From: Jesse Barnes @ 2008-09-26 18:39 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Jiri Kosina, Brandeburg, Jesse, LKML, agospoda, Ronciak, John,
	Allan, Bruce W, Graham, David, kkiel, tglx, chris.jones, arjan

[-- Attachment #1: Type: text/plain, Size: 1499 bytes --]

On Friday, September 26, 2008 11:23 am Jesse Barnes wrote:
> On Friday, September 26, 2008 10:52 am Jesse Barnes wrote:
> > On Friday, September 26, 2008 4:49 am Arjan van de Ven wrote:
> > > Jiri Kosina wrote:
> > > > On Thu, 25 Sep 2008, Brandeburg, Jesse wrote:
> > > >> this is the current set of patches that I have to help us debug
> > > >> and/or fix e1000e issues found during this debug effort for
> > > >> the corrupt NVM.  the "drop stats lock" - "reset swflag" patches
> > > >> allow Thomas' patch for a mutex in the SWFLAG acquire function to
> > > >> run without any errors.
> > > >
> > > > Thanks. Also Jesse Barnes' patch shouldn't be forgotten, could you
> > > > please add it to that lineup?
> > > >
> > > > 	http://marc.info/?l=linux-kernel&m=122237193628087&w=2
> > >
> > > can we (for now) also stick a WARN_ON() into that failure path? that
> > > way we can at least catch if/when this happens more visibly..... if it
> > > happens consistently in say the new distros we can be more confident
> > > that we're down the right path in diagnosing the issue.
> >
> > I'm spinning a new one now with some debug output, stay tuned (just gotta
> > boot my test box).
>
> Ok here's an updated one.  Jesse (Br) can you add it to your list?  If the
> X driver really is mapping too much this should catch it, as long as it
> goes through sysfs.

Arjan pointed out I may as well just use WARN() these days.  Updated patch 
attached.

-- 
Jesse Barnes, Intel Open Source Technology Center

[-- Attachment #2: pci-sysfs-mmap-range-check-3.patch --]
[-- Type: text/x-diff, Size: 1397 bytes --]

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 9c71858..070fbe9 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -16,6 +16,7 @@
 
 
 #include <linux/kernel.h>
+#include <linux/sched.h>
 #include <linux/pci.h>
 #include <linux/stat.h>
 #include <linux/topology.h>
@@ -502,6 +503,8 @@ pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
 	struct resource *res = (struct resource *)attr->private;
 	enum pci_mmap_state mmap_type;
 	resource_size_t start, end;
+	unsigned long map_len = vma->vm_end - vma->vm_start;
+	unsigned long map_offset = vma->vm_pgoff << PAGE_SHIFT;
 	int i;
 
 	for (i = 0; i < PCI_ROM_RESOURCE; i++)
@@ -510,6 +513,17 @@ pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
 	if (i >= PCI_ROM_RESOURCE)
 		return -ENODEV;
 
+	/*
+	 * Make sure the range the user is trying to map falls within
+	 * the resource
+	 */
+	if (map_offset + map_len > pci_resource_len(pdev, i)) {
+		WARN("process \"%s\" tried to map 0x%08lx-0x%08lx on BAR %d (size 0x%08lx)\n",
+		     current->comm, map_offset, map_offset + map_len, i,
+		     (unsigned long)pci_resource_len(pdev, i));
+		return -EINVAL;
+	}
+
 	/* pci_mmap_page_range() expects the same kind of entry as coming
 	 * from /proc/bus/pci/ which is a "user visible" value. If this is
 	 * different from the resource itself, arch will do necessary fixup.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26 18:39             ` Jesse Barnes
@ 2008-09-26 18:43               ` Jesse Barnes
  0 siblings, 0 replies; 39+ messages in thread
From: Jesse Barnes @ 2008-09-26 18:43 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Jiri Kosina, Brandeburg, Jesse, LKML, agospoda, Ronciak, John,
	Allan, Bruce W, Graham, David, kkiel, tglx, chris.jones, arjan

[-- Attachment #1: Type: text/plain, Size: 1651 bytes --]

On Friday, September 26, 2008 11:39 am Jesse Barnes wrote:
> On Friday, September 26, 2008 11:23 am Jesse Barnes wrote:
> > On Friday, September 26, 2008 10:52 am Jesse Barnes wrote:
> > > On Friday, September 26, 2008 4:49 am Arjan van de Ven wrote:
> > > > Jiri Kosina wrote:
> > > > > On Thu, 25 Sep 2008, Brandeburg, Jesse wrote:
> > > > >> this is the current set of patches that I have to help us debug
> > > > >> and/or fix e1000e issues found during this debug effort for
> > > > >> the corrupt NVM.  the "drop stats lock" - "reset swflag" patches
> > > > >> allow Thomas' patch for a mutex in the SWFLAG acquire function to
> > > > >> run without any errors.
> > > > >
> > > > > Thanks. Also Jesse Barnes' patch shouldn't be forgotten, could you
> > > > > please add it to that lineup?
> > > > >
> > > > > 	http://marc.info/?l=linux-kernel&m=122237193628087&w=2
> > > >
> > > > can we (for now) also stick a WARN_ON() into that failure path? that
> > > > way we can at least catch if/when this happens more visibly..... if
> > > > it happens consistently in say the new distros we can be more
> > > > confident that we're down the right path in diagnosing the issue.
> > >
> > > I'm spinning a new one now with some debug output, stay tuned (just
> > > gotta boot my test box).
> >
> > Ok here's an updated one.  Jesse (Br) can you add it to your list?  If
> > the X driver really is mapping too much this should catch it, as long as
> > it goes through sysfs.
>
> Arjan pointed out I may as well just use WARN() these days.  Updated patch
> attached.

Even better use WARN() correctly.

-- 
Jesse Barnes, Intel Open Source Technology Center

[-- Attachment #2: pci-sysfs-mmap-range-check-4.patch --]
[-- Type: text/x-diff, Size: 1400 bytes --]

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 9c71858..4d1aa6e 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -16,6 +16,7 @@
 
 
 #include <linux/kernel.h>
+#include <linux/sched.h>
 #include <linux/pci.h>
 #include <linux/stat.h>
 #include <linux/topology.h>
@@ -502,6 +503,8 @@ pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
 	struct resource *res = (struct resource *)attr->private;
 	enum pci_mmap_state mmap_type;
 	resource_size_t start, end;
+	unsigned long map_len = vma->vm_end - vma->vm_start;
+	unsigned long map_offset = vma->vm_pgoff << PAGE_SHIFT;
 	int i;
 
 	for (i = 0; i < PCI_ROM_RESOURCE; i++)
@@ -510,6 +513,17 @@ pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
 	if (i >= PCI_ROM_RESOURCE)
 		return -ENODEV;
 
+	/*
+	 * Make sure the range the user is trying to map falls within
+	 * the resource
+	 */
+	if (map_offset + map_len > pci_resource_len(pdev, i)) {
+		WARN(1, "process \"%s\" tried to map 0x%08lx-0x%08lx on BAR %d (size 0x%08lx)\n",
+		     current->comm, map_offset, map_offset + map_len, i,
+		     (unsigned long)pci_resource_len(pdev, i));
+		return -EINVAL;
+	}
+
 	/* pci_mmap_page_range() expects the same kind of entry as coming
 	 * from /proc/bus/pci/ which is a "user visible" value. If this is
 	 * different from the resource itself, arch will do necessary fixup.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26 18:23           ` Jesse Barnes
  2008-09-26 18:39             ` Jesse Barnes
@ 2008-09-26 18:53             ` Tim Gardner
  2008-09-26 22:04               ` Krzysztof Halasa
  2008-09-27  0:05               ` Brandeburg, Jesse
  1 sibling, 2 replies; 39+ messages in thread
From: Tim Gardner @ 2008-09-26 18:53 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Arjan van de Ven, Jiri Kosina, Brandeburg, Jesse, LKML, agospoda,
	Ronciak, John, Allan, Bruce W, Graham, David, kkiel, tglx,
	chris.jones, arjan

Jesse Barnes wrote:
> On Friday, September 26, 2008 10:52 am Jesse Barnes wrote:
>> On Friday, September 26, 2008 4:49 am Arjan van de Ven wrote:
>>> Jiri Kosina wrote:
>>>> On Thu, 25 Sep 2008, Brandeburg, Jesse wrote:
>>>>> this is the current set of patches that I have to help us debug
>>>>> and/or fix e1000e issues found during this debug effort for
>>>>> the corrupt NVM.  the "drop stats lock" - "reset swflag" patches allow
>>>>> Thomas' patch for a mutex in the SWFLAG acquire function to run
>>>>> without any errors.
>>>> Thanks. Also Jesse Barnes' patch shouldn't be forgotten, could you
>>>> please add it to that lineup?
>>>>
>>>> 	http://marc.info/?l=linux-kernel&m=122237193628087&w=2
>>> can we (for now) also stick a WARN_ON() into that failure path? that way
>>> we can at least catch if/when this happens more visibly..... if it
>>> happens consistently in say the new distros we can be more confident that
>>> we're down the right path in diagnosing the issue.
>> I'm spinning a new one now with some debug output, stay tuned (just gotta
>> boot my test box).
> 
> Ok here's an updated one.  Jesse (Br) can you add it to your list?  If the X 
> driver really is mapping too much this should catch it, as long as it goes 
> through sysfs.
> 
> Thanks,
> Jesse
> 

I've been experimenting with unmapping flash space until its actually
needed, e.g., in the functions that use the E1000_READ_FLASH and
E1000_WRITE_FLASH macros. Along the way I looked at how flash write
cycles are initiated because I was having a hard time believing that
having flash space mapped was part of the root cause. However, it looks
like its pretty simple to initiate a write or erase cycle. All of the
required action bits in ICH_FLASH_HSFSTS and ICH_FLASH_HSFCTL must be 1,
and these 2 register are in the correct order if X was writing 0xff in
ascending order.

Just a thought.

rtg
-- 
Tim Gardner timg@tpi.com www.tpi.com
OR 503-601-0234 x102 MT 406-443-5357

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26 18:53             ` Tim Gardner
@ 2008-09-26 22:04               ` Krzysztof Halasa
  2008-09-26 22:23                 ` Brandeburg, Jesse
  2008-09-27  0:05               ` Brandeburg, Jesse
  1 sibling, 1 reply; 39+ messages in thread
From: Krzysztof Halasa @ 2008-09-26 22:04 UTC (permalink / raw)
  To: Tim Gardner
  Cc: Jesse Barnes, Arjan van de Ven, Jiri Kosina, Brandeburg, Jesse,
	LKML, agospoda, Ronciak, John, Allan, Bruce W, Graham, David,
	kkiel, tglx, chris.jones, arjan

Tim Gardner <timg@tpi.com> writes:

> I've been experimenting with unmapping flash space until its actually
> needed, e.g., in the functions that use the E1000_READ_FLASH and
> E1000_WRITE_FLASH macros. Along the way I looked at how flash write
> cycles are initiated because I was having a hard time believing that
> having flash space mapped was part of the root cause. However, it looks
> like its pretty simple to initiate a write or erase cycle. All of the
> required action bits in ICH_FLASH_HSFSTS and ICH_FLASH_HSFCTL must be 1,
> and these 2 register are in the correct order if X was writing 0xff in
> ascending order.

But... do you really have a flash chip there? I think it's more about
EEPROM (a serial usually 8-pin small chip, keeping the MAC address and
hardware configuration). Flash chips are used for diskless booting
(though corrupting them can make the machine unbootable of course).

Sure, writing to a parallel flash chip is easy, much easier than to
serial EEPROM.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: e1000e NVM corruption issue status
  2008-09-26 22:04               ` Krzysztof Halasa
@ 2008-09-26 22:23                 ` Brandeburg, Jesse
  2008-09-27 18:45                   ` Krzysztof Halasa
  0 siblings, 1 reply; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-26 22:23 UTC (permalink / raw)
  To: Krzysztof Halasa, Tim Gardner
  Cc: Jesse Barnes, Arjan van de Ven, Jiri Kosina, LKML, agospoda,
	Ronciak, John, Allan, Bruce W, Graham, David, kkiel, tglx,
	chris.jones, arjan

Krzysztof Halasa wrote:
> But... do you really have a flash chip there? I think it's more about
> EEPROM (a serial usually 8-pin small chip, keeping the MAC address and
> hardware configuration). Flash chips are used for diskless booting
> (though corrupting them can make the machine unbootable of course).
> 
> Sure, writing to a parallel flash chip is easy, much easier than to
> serial EEPROM.

ICH 8/9/10 machines with Intel gigabit part integrated (82566/82567)
share the system Flash space with all the other system devices, BIOS,
etc.  The gigabit region is the currently only "unprotected" region I
know of.  It is never directly memory mapped, but the registers that
program to it are memory mapped from our BAR1, like Tim said, possibly
only requiring an errant write of a few bits of ones, to erase it (I've
been trying to confirm that)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26 18:53             ` Tim Gardner
  2008-09-26 22:04               ` Krzysztof Halasa
@ 2008-09-27  0:05               ` Brandeburg, Jesse
  2008-09-27  4:20                 ` Tim Gardner
  1 sibling, 1 reply; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-27  0:05 UTC (permalink / raw)
  To: Tim Gardner
  Cc: Jesse Barnes, Arjan van de Ven, Jiri Kosina, Brandeburg, Jesse,
	LKML, agospoda, Ronciak, John, Allan, Bruce W, Graham, David,
	kkiel, tglx, chris.jones, arjan

On Fri, 26 Sep 2008, Tim Gardner wrote:
> > Ok here's an updated one.  Jesse (Br) can you add it to your list?  If the X 
> > driver really is mapping too much this should catch it, as long as it goes 
> > through sysfs.

I have, am testing with it now.

> I've been experimenting with unmapping flash space until its actually
> needed, e.g., in the functions that use the E1000_READ_FLASH and
> E1000_WRITE_FLASH macros. Along the way I looked at how flash write

That sounds like a good patch set.  I had thought of trying that but 
hadn't gotten to it yet, so if you have something to look at in diff 
format just post it and we'll take a look.

> cycles are initiated because I was having a hard time believing that
> having flash space mapped was part of the root cause. However, it looks
> like its pretty simple to initiate a write or erase cycle. All of the
> required action bits in ICH_FLASH_HSFSTS and ICH_FLASH_HSFCTL must be 1,
> and these 2 register are in the correct order if X was writing 0xff in
> ascending order.

Seems simple but when I tried it for a couple of hours yesterday I 
couldn't get anything to happen to my flash.  This included putting 
ew16flash writes in the e1000e driver, and writing those magic bits.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-27  0:05               ` Brandeburg, Jesse
@ 2008-09-27  4:20                 ` Tim Gardner
  0 siblings, 0 replies; 39+ messages in thread
From: Tim Gardner @ 2008-09-27  4:20 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: Jesse Barnes, Arjan van de Ven, Jiri Kosina, LKML, agospoda,
	Ronciak, John, Allan, Bruce W, Graham, David, kkiel, tglx,
	chris.jones

[-- Attachment #1: Type: text/plain, Size: 1714 bytes --]

Brandeburg, Jesse wrote:
> On Fri, 26 Sep 2008, Tim Gardner wrote:
>>> Ok here's an updated one.  Jesse (Br) can you add it to your list?  If the X 
>>> driver really is mapping too much this should catch it, as long as it goes 
>>> through sysfs.
> 
> I have, am testing with it now.
> 
>> I've been experimenting with unmapping flash space until its actually
>> needed, e.g., in the functions that use the E1000_READ_FLASH and
>> E1000_WRITE_FLASH macros. Along the way I looked at how flash write
> 
> That sounds like a good patch set.  I had thought of trying that but 
> hadn't gotten to it yet, so if you have something to look at in diff 
> format just post it and we'll take a look.
> 
>> cycles are initiated because I was having a hard time believing that
>> having flash space mapped was part of the root cause. However, it looks
>> like its pretty simple to initiate a write or erase cycle. All of the
>> required action bits in ICH_FLASH_HSFSTS and ICH_FLASH_HSFCTL must be 1,
>> and these 2 register are in the correct order if X was writing 0xff in
>> ascending order.
> 
> Seems simple but when I tried it for a couple of hours yesterday I 
> couldn't get anything to happen to my flash.  This included putting 
> ew16flash writes in the e1000e driver, and writing those magic bits.

Jesse,

Here is a patch against 2.6.27-rc7 that maps flash space on demand.
I've only compile tested it, but a similar patch that applies against
the Ubuntu Intrepid tree is at

https://lists.ubuntu.com/archives/kernel-team/2008-September/003193.html

Note that this 2nd, actually tested patch is against the e1000e
driver v0.4.7.1 that originated from sourceforge.

rtg
-- 
Tim Gardner tim.gardner@canonical.com

[-- Attachment #2: 0001-e1000e-ioremap-NV-RAM-as-needed.patch --]
[-- Type: text/x-diff, Size: 8273 bytes --]

>From 98a10d340fd029c4c910cf18845e116b53df3ab4 Mon Sep 17 00:00:00 2001
From: Tim Gardner <tim.gardner@canonical.com>
Date: Fri, 26 Sep 2008 22:02:17 -0600
Subject: [PATCH] e1000e: ioremap NV RAM as needed.

Instead of leaving flash mapped for the life of the driver, map it on
demand when accessed from the ethtool interfaces. This ought to prevent random
writes to flash registers accidentally starting an erase or write cycle.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
---
 drivers/net/e1000e/hw.h      |    4 ++
 drivers/net/e1000e/ich8lan.c |  114 +++++++++++++++++++++++++++++++++++++----
 drivers/net/e1000e/netdev.c  |    5 ++
 3 files changed, 111 insertions(+), 12 deletions(-)

diff --git a/drivers/net/e1000e/hw.h b/drivers/net/e1000e/hw.h
index 74f263a..355fe45 100644
--- a/drivers/net/e1000e/hw.h
+++ b/drivers/net/e1000e/hw.h
@@ -864,6 +864,10 @@ struct e1000_hw {
 	u8 __iomem *hw_addr;
 	u8 __iomem *flash_address;
 
+	/* Protect access to NV RAM mapping */
+	spinlock_t flash_address_map_lock;
+	u32 flash_address_map_cnt;
+
 	struct e1000_mac_info  mac;
 	struct e1000_fc_info   fc;
 	struct e1000_phy_info  phy;
diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
index 9e38452..8275b33 100644
--- a/drivers/net/e1000e/ich8lan.c
+++ b/drivers/net/e1000e/ich8lan.c
@@ -189,6 +189,54 @@ static inline void __ew32flash(struct e1000_hw *hw, unsigned long reg, u32 val)
 #define ew16flash(reg,val)	__ew16flash(hw, (reg), (val))
 #define ew32flash(reg,val)	__ew32flash(hw, (reg), (val))
 
+/*
+ * NV RAM mapping can get nested, so keep track of the mapping depth.
+ */
+static int e1000e_map_nvram(struct e1000_hw *hw)
+{
+	int ret_val = 0;
+	struct e1000_adapter *adapter = hw->adapter;
+
+	spin_lock(&hw->flash_address_map_lock);
+
+	hw->flash_address_map_cnt++;
+
+	if (hw->flash_address_map_cnt==1) {
+		BUG_ON(adapter->hw.flash_address);
+		WARN_ON(irqs_disabled());
+		adapter->hw.flash_address = ioremap(pci_resource_start(adapter->pdev, 1),
+							pci_resource_len(adapter->pdev, 1));
+		if (!adapter->hw.flash_address) {
+			hw->flash_address_map_cnt--;
+			ret_val = -E1000_ERR_NVM;
+		}
+	}
+
+	spin_unlock(&hw->flash_address_map_lock);
+
+	return ret_val;
+}
+
+static void e1000e_unmap_nvram(struct e1000_hw *hw)
+{
+	struct e1000_adapter *adapter = hw->adapter;
+
+	spin_lock(&hw->flash_address_map_lock);
+
+	BUG_ON(!hw->flash_address_map_cnt);
+
+	hw->flash_address_map_cnt--;
+
+	if (hw->flash_address_map_cnt==0) {
+		BUG_ON(!adapter->hw.flash_address);
+		WARN_ON(irqs_disabled());
+		iounmap(adapter->hw.flash_address);
+		adapter->hw.flash_address = NULL;
+	}
+
+	spin_unlock(&hw->flash_address_map_lock);
+}
+
 /**
  *  e1000_init_phy_params_ich8lan - Initialize PHY function pointers
  *  @hw: pointer to the HW structure
@@ -268,10 +316,11 @@ static s32 e1000_init_nvm_params_ich8lan(struct e1000_hw *hw)
 	u32 sector_base_addr;
 	u32 sector_end_addr;
 	u16 i;
+	int ret_val;
 
 	/* Can't read flash registers if the register set isn't mapped. */
-	if (!hw->flash_address) {
-		hw_dbg(hw, "ERROR: Flash registers not mapped\n");
+	ret_val = e1000e_map_nvram(hw);
+	if (ret_val) {
 		return -E1000_ERR_CONFIG;
 	}
 
@@ -308,6 +357,7 @@ static s32 e1000_init_nvm_params_ich8lan(struct e1000_hw *hw)
 		dev_spec->shadow_ram[i].value    = 0xFFFF;
 	}
 
+	e1000e_unmap_nvram(hw);
 	return 0;
 }
 
@@ -962,13 +1012,19 @@ static s32 e1000_flash_cycle_init_ich8lan(struct e1000_hw *hw)
 	s32 ret_val = -E1000_ERR_NVM;
 	s32 i = 0;
 
+	ret_val = e1000e_map_nvram(hw);
+	if (ret_val) {
+		return ret_val;
+	}
+
 	hsfsts.regval = er16flash(ICH_FLASH_HSFSTS);
 
 	/* Check if the flash descriptor is valid */
 	if (hsfsts.hsf_status.fldesvalid == 0) {
 		hw_dbg(hw, "Flash descriptor invalid.  "
 			 "SW Sequencing must be used.");
-		return -E1000_ERR_NVM;
+		ret_val = -E1000_ERR_NVM;
+		goto out;
 	}
 
 	/* Clear FCERR and DAEL in hw status by writing 1 */
@@ -1020,6 +1076,8 @@ static s32 e1000_flash_cycle_init_ich8lan(struct e1000_hw *hw)
 		}
 	}
 
+out:
+	e1000e_unmap_nvram(hw);
 	return ret_val;
 }
 
@@ -1034,9 +1092,16 @@ static s32 e1000_flash_cycle_ich8lan(struct e1000_hw *hw, u32 timeout)
 {
 	union ich8_hws_flash_ctrl hsflctl;
 	union ich8_hws_flash_status hsfsts;
-	s32 ret_val = -E1000_ERR_NVM;
+	s32 ret_val;
 	u32 i = 0;
 
+	ret_val = e1000e_map_nvram(hw);
+	if (ret_val) {
+		return ret_val;
+	}
+
+	ret_val = -E1000_ERR_NVM;
+
 	/* Start a cycle by writing 1 in Flash Cycle Go in Hw Flash Control */
 	hsflctl.regval = er16flash(ICH_FLASH_HSFCTL);
 	hsflctl.hsf_ctrl.flcgo = 1;
@@ -1051,8 +1116,9 @@ static s32 e1000_flash_cycle_ich8lan(struct e1000_hw *hw, u32 timeout)
 	} while (i++ < timeout);
 
 	if (hsfsts.hsf_status.flcdone == 1 && hsfsts.hsf_status.flcerr == 0)
-		return 0;
+		ret_val = 0;
 
+	e1000e_unmap_nvram(hw);
 	return ret_val;
 }
 
@@ -1093,8 +1159,15 @@ static s32 e1000_read_flash_data_ich8lan(struct e1000_hw *hw, u32 offset,
 	s32 ret_val = -E1000_ERR_NVM;
 	u8 count = 0;
 
-	if (size < 1  || size > 2 || offset > ICH_FLASH_LINEAR_ADDR_MASK)
-		return -E1000_ERR_NVM;
+	ret_val = e1000e_map_nvram(hw);
+	if (ret_val) {
+		return ret_val;
+	}
+
+	if (size < 1  || size > 2 || offset > ICH_FLASH_LINEAR_ADDR_MASK) {
+		ret_val = -E1000_ERR_NVM;
+		goto out;
+	}
 
 	flash_linear_addr = (ICH_FLASH_LINEAR_ADDR_MASK & offset) +
 			    hw->nvm.flash_base_addr;
@@ -1150,6 +1223,8 @@ static s32 e1000_read_flash_data_ich8lan(struct e1000_hw *hw, u32 offset,
 		}
 	} while (count++ < ICH_FLASH_CYCLE_REPEAT_COUNT);
 
+out:
+	e1000e_unmap_nvram(hw);
 	return ret_val;
 }
 
@@ -1396,6 +1471,11 @@ static s32 e1000_write_flash_data_ich8lan(struct e1000_hw *hw, u32 offset,
 	    offset > ICH_FLASH_LINEAR_ADDR_MASK)
 		return -E1000_ERR_NVM;
 
+	ret_val = e1000e_map_nvram(hw);
+	if (ret_val) {
+		return ret_val;
+	}
+
 	flash_linear_addr = (ICH_FLASH_LINEAR_ADDR_MASK & offset) +
 			    hw->nvm.flash_base_addr;
 
@@ -1447,6 +1527,7 @@ static s32 e1000_write_flash_data_ich8lan(struct e1000_hw *hw, u32 offset,
 		}
 	} while (count++ < ICH_FLASH_CYCLE_REPEAT_COUNT);
 
+	e1000e_unmap_nvram(hw);
 	return ret_val;
 }
 
@@ -1520,6 +1601,11 @@ static s32 e1000_erase_flash_bank_ich8lan(struct e1000_hw *hw, u32 bank)
 	s32 sector_size;
 	s32 j;
 
+	ret_val = e1000e_map_nvram(hw);
+	if (ret_val) {
+		return ret_val;
+	}
+
 	hsfsts.regval = er16flash(ICH_FLASH_HSFSTS);
 
 	/*
@@ -1550,7 +1636,8 @@ static s32 e1000_erase_flash_bank_ich8lan(struct e1000_hw *hw, u32 bank)
 			sector_size = ICH_FLASH_SEG_SIZE_8K;
 			iteration = flash_bank_size / ICH_FLASH_SEG_SIZE_8K;
 		} else {
-			return -E1000_ERR_NVM;
+			ret_val = -E1000_ERR_NVM;
+			goto out;
 		}
 		break;
 	case 3:
@@ -1558,7 +1645,8 @@ static s32 e1000_erase_flash_bank_ich8lan(struct e1000_hw *hw, u32 bank)
 		iteration = flash_bank_size / ICH_FLASH_SEG_SIZE_64K;
 		break;
 	default:
-		return -E1000_ERR_NVM;
+		ret_val = -E1000_ERR_NVM;
+		goto out;
 	}
 
 	/* Start with the base address, then add the sector offset. */
@@ -1570,7 +1658,7 @@ static s32 e1000_erase_flash_bank_ich8lan(struct e1000_hw *hw, u32 bank)
 			/* Steps */
 			ret_val = e1000_flash_cycle_init_ich8lan(hw);
 			if (ret_val)
-				return ret_val;
+				goto out;
 
 			/*
 			 * Write a value 11 (block Erase) in Flash
@@ -1603,11 +1691,13 @@ static s32 e1000_erase_flash_bank_ich8lan(struct e1000_hw *hw, u32 bank)
 				/* repeat for some time before giving up */
 				continue;
 			else if (hsfsts.hsf_status.flcdone == 0)
-				return ret_val;
+				goto out;
 		} while (++count < ICH_FLASH_CYCLE_REPEAT_COUNT);
 	}
 
-	return 0;
+out:
+	e1000e_unmap_nvram(hw);
+	return ret_val;
 }
 
 /**
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index d266510..e5fe247 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -4439,6 +4439,11 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
 		adapter->hw.flash_address = ioremap(flash_start, flash_len);
 		if (!adapter->hw.flash_address)
 			goto err_flashmap;
+
+		/* Now that we're sure NV RAM is mappable, unmap it until needed. */
+		iounmap(adapter->hw.flash_address);
+		adapter->hw.flash_address = NULL;
+		spin_lock_init(&adapter->hw.flash_address_map_lock);
 	}
 
 	/* construct the net_device struct */
-- 
1.5.4.3


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26 22:23                 ` Brandeburg, Jesse
@ 2008-09-27 18:45                   ` Krzysztof Halasa
  0 siblings, 0 replies; 39+ messages in thread
From: Krzysztof Halasa @ 2008-09-27 18:45 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: Tim Gardner, Jesse Barnes, Arjan van de Ven, Jiri Kosina, LKML,
	agospoda, Ronciak, John, Allan, Bruce W, Graham, David, kkiel,
	tglx, chris.jones, arjan

"Brandeburg, Jesse" <jesse.brandeburg@intel.com> writes:

> ICH 8/9/10 machines with Intel gigabit part integrated (82566/82567)
> share the system Flash space with all the other system devices, BIOS,
> etc.  The gigabit region is the currently only "unprotected" region I
> know of.  It is never directly memory mapped, but the registers that
> program to it are memory mapped from our BAR1, like Tim said, possibly
> only requiring an errant write of a few bits of ones, to erase it (I've
> been trying to confirm that)

I don't know the ICH 8/9/10 very well but it seems typical, i.e., the
flash is a X Mbit device usually mapped at some really high address
and then copied/decompressed to RAM ("shadow ROM" at the usual
locations 0xC0000 for VGA, 0xF0000 or so for system BIOS, something
between the two for Ethernet PXE).

Is the protection you write about the hardware flash region
protection? It can be easily removed by another command (another
write).

Anyway the problem in question is EEPROM, not flash?


I'm sure that simply erasing the PXE flash region would not prevent
the machine from booting. The BIOS requires a valid signature (55AA or
so) at the start of a BIOS extension block, and there is a checksum.

I also think that blindly erasing some regions of flash, or blindly
writing to it wouldn't kill the machine completely - there is a
boot block which is almost certainly protected (requires a command to
unblock). The boot block should notice the main BIOS image is
corrupted and should allow reflashing (using a DOS diskette and
perhaps without VGA output).
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  2:13     ` Brandeburg, Jesse
@ 2008-09-29 15:52       ` Jiri Kosina
  2008-09-29 16:20         ` Jiri Kosina
  0 siblings, 1 reply; 39+ messages in thread
From: Jiri Kosina @ 2008-09-29 15:52 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, agospoda, Ronciak, John, Allan, Bruce W, Graham, David,
	kkiel, Thomas Gleixner, chris.jones, arjan

On Thu, 25 Sep 2008, Brandeburg, Jesse wrote:

> in case your mailer hoses something apply in this order:
> # This series applies on GIT commit 011fcfcb75311c7368f13170b9e68adcf146a557
> 01-e-mem.patch
> 02-e_flash.patch
> 03-e1000e-release-lock-in-reset.patch
> 04-e1000e-dont-sleep.patch
> 05-e1000e-no-deeplocks.patch
> 06-e1000e-drop-stats-lock.patch
> 07-subject-e1000e-debug-patch.patch
> 08-e1000e-version.patch
> 09-e1000e-allow-bad-checksum.patch
> 10-e1000e-dump-eeprom-to-dmesg.txt

When using this patchset (plus patch that adds check for address range in 
pci_mmap_resource() by Jesse Barnes), the machine (that already has 
corrupted (but not completely erased)) hangs after dumping eeprom 
contents:

0000:00:19.0: 0000:00:19.0: The NVM Checksum Is Not Valid
/*********************/
Current EEPROM Checksum : 0x2259
Calculated              : 0xa259
Offset    Values
========  ======
00000000: 00 15 58 c6 4a ff 00 08 ff ff 30 00 ff ff ff ff
00000010: ff ff ff ff c7 10 b9 20 aa 17 49 10 86 80 00 00
00000020: 01 0d 00 00 00 00 05 16 20 50 00 38 00 00 8b 0d
00000030: 02 06 c1 01 03 08 00 00 00 00 00 00 00 00 00 00
00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000060: 00 01 00 40 28 12 07 40 ff ff ff ff ff ff ff ff
00000070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 59 22
/*********************/


after this, alt-sysrq-p indicates that it's somehow running in the loops 
around r1000_read_nvm_ich8lan and e1000_release_swflag_uch8lan. Below 
there are several subsequent alt-sysrq-p outputs on this frozen system



SysRq : Show Regs
CPU 1:
Modules linked in: pcmcia_core v4l1_compat pcspkr(+) e1000e(+) button(+) 
joydev led_class parport soundcore sg sr_mod cdrom sd_mod crc_t10r
Pid: 841, comm: modprobe Tainted: G          2.6.27-rc6-7.10-default #1
RIP: 0010:[<ffffffffa01e6a88>]  [<ffffffffa01e6a88>] 
e1000_release_swflag_ich8lan+0x15/0x3c [e1000e]
RSP: 0018:ffff88003adb5b48  EFLAGS: 00000286
RAX: ffffc20004540f00 RBX: ffff88003adb5b48 RCX: ffff88003adb5b7e
RDX: 0000000000000022 RSI: 000000000000431c RDI: ffff88003c44cb28
RBP: 0000000000000000 R08: ffff88003adb5b7e R09: ffff88003c44cb28
R10: ffff88003adb5b7e R11: ffff88003adb5ad8 R12: ffff88003c44cb28
R13: ffffc20004520008 R14: ffffffff8020c394 R15: ffff88003adb5b08
FS:  00007f398e1eb6f0(0000) GS:ffff88003e1e93c0(0000) 
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000061ee78 CR3: 000000003b125000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Call Trace:
Inexact backtrace:

 [<ffffffffa01e6a7c>] ? e1000_release_swflag_ich8lan+0x9/0x3c [e1000e]
 [<ffffffffa01e7184>] ? e1000_read_nvm_ich8lan+0xf2/0x10a [e1000e]
 [<ffffffff8020c394>] ? mcount_call+0x5/0x31
 [<ffffffffa01e9c46>] ? e1000e_validate_nvm_checksum_generic+0x34/0x62 
[e1000e]
 [<ffffffffa01e63b4>] ? e1000_validate_nvm_checksum_ich8lan+0x6c/0x73 
[e1000e]
 [<ffffffffa01f661e>] ? e1000_probe+0x5a4/0xb7e [e1000e]
 [<ffffffffa01f661e>] ? e1000_probe+0x5a4/0xb7e [e1000e]
 [<ffffffff8023e290>] ? set_cpus_allowed_ptr+0x119/0x126
 [<ffffffff803784a8>] ? kobject_get+0x1a/0x22
 [<ffffffff8038b889>] ? pci_device_probe+0xc9/0x120
 [<ffffffff80401958>] ? driver_probe_device+0xc5/0x173
 [<ffffffff80401a5a>] ? __driver_attach+0x54/0x7e
 [<ffffffff80401a06>] ? __driver_attach+0x0/0x7e
 [<ffffffff80401186>] ? bus_for_each_dev+0x54/0x8e
 [<ffffffff80401794>] ? driver_attach+0x21/0x23
 [<ffffffff80400a74>] ? bus_add_driver+0xbc/0x206
 [<ffffffff80401c6c>] ? driver_register+0xad/0x12d
 [<ffffffff8038bb5e>] ? __pci_register_driver+0x6b/0xa5
 [<ffffffffa0177000>] ? e1000_init_module+0x0/0x75 [e1000e]
 [<ffffffffa0177059>] ? e1000_init_module+0x59/0x75 [e1000e]
 [<ffffffff8020904c>] ? _stext+0x4c/0x151
 [<ffffffff80268169>] ? sys_init_module+0xae/0x1cc
 [<ffffffff8020c57a>] ? system_call_fastpath+0x16/0x1b



SysRq : Show Regs
CPU 1:
Modules linked in: pcmcia_core v4l1_compat pcspkr(+) e1000e(+) button(+) 
joydev led_class parport soundcore sg sr_mod cdrom sd_mod crc_t10r
Pid: 841, comm: modprobe Tainted: G          2.6.27-rc6-7.10-default #1
RIP: 0010:[<ffffffffa01e6481>]  [<ffffffffa01e6481>] 
e1000_flash_cycle_ich8lan+0x3d/0x6d [e1000e]
RSP: 0018:ffff88003adb5ae8  EFLAGS: 00000282
RAX: ffffc20004526009 RBX: ffff88003adb5b08 RCX: 000000006dae9ce2
RDX: 000000006dae9ce2 RSI: 000000000000431c RDI: 00000000000007bc
RBP: 0000000000000018 R08: ffff88003adb5b7e R09: ffff88003c44cb28
R10: ffff88003adb5b7e R11: ffff88003adb5ac8 R12: ffff88003adb5a68
R13: 0000000000000282 R14: 0000000000000010 R15: ffffffff802135d0
FS:  00007f398e1eb6f0(0000) GS:ffff88003e1e93c0(0000) 
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000061ee78 CR3: 000000003b125000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Call Trace:
Inexact backtrace:

 [<ffffffffa01e6491>] ? e1000_flash_cycle_ich8lan+0x4d/0x6d [e1000e]
 [<ffffffffa01e66ac>] ? e1000_read_flash_data_ich8lan+0xab/0x104 [e1000e]
 [<ffffffffa01e715e>] ? e1000_read_nvm_ich8lan+0xcc/0x10a [e1000e]
 [<ffffffff8020c394>] ? mcount_call+0x5/0x31
 [<ffffffffa01e9c46>] ? e1000e_validate_nvm_checksum_generic+0x34/0x62 
[e1000e]
 [<ffffffffa01e63b4>] ? e1000_validate_nvm_checksum_ich8lan+0x6c/0x73 
[e1000e]
 [<ffffffffa01f661e>] ? e1000_probe+0x5a4/0xb7e [e1000e]
 [<ffffffffa01f661e>] ? e1000_probe+0x5a4/0xb7e [e1000e]
 [<ffffffff8023e290>] ? set_cpus_allowed_ptr+0x119/0x126
 [<ffffffff803784a8>] ? kobject_get+0x1a/0x22
 [<ffffffff8038b889>] ? pci_device_probe+0xc9/0x120
 [<ffffffff80401958>] ? driver_probe_device+0xc5/0x173
 [<ffffffff80401a5a>] ? __driver_attach+0x54/0x7e
 [<ffffffff80401a06>] ? __driver_attach+0x0/0x7e
 [<ffffffff80401186>] ? bus_for_each_dev+0x54/0x8e
 [<ffffffff80401794>] ? driver_attach+0x21/0x23
 [<ffffffff80400a74>] ? bus_add_driver+0xbc/0x206
 [<ffffffff80401c6c>] ? driver_register+0xad/0x12d
 [<ffffffff8038bb5e>] ? __pci_register_driver+0x6b/0xa5
 [<ffffffffa0177000>] ? e1000_init_module+0x0/0x75 [e1000e]
 [<ffffffffa0177059>] ? e1000_init_module+0x59/0x75 [e1000e]
 [<ffffffff8020904c>] ? _stext+0x4c/0x151
 [<ffffffff80268169>] ? sys_init_module+0xae/0x1cc
 [<ffffffff8020c57a>] ? system_call_fastpath+0x16/0x1b


SysRq : Show Regs
CPU 1:
Modules linked in: pcmcia_core v4l1_compat pcspkr(+) e1000e(+) button(+) 
joydev led_class parport soundcore sg sr_mod cdrom sd_mod crc_t10r
Pid: 841, comm: modprobe Tainted: G          2.6.27-rc6-7.10-default #1
RIP: 0010:[<ffffffffa01e66bc>]  [<ffffffffa01e66bc>] 
e1000_read_flash_data_ich8lan+0xbb/0x104 [e1000e]
RSP: 0018:ffff88003adb5b18  EFLAGS: 00000282
RAX: 0000000000000000 RBX: ffff88003adb5b48 RCX: 000000002edf089a
RDX: 0000000000000000 RSI: 000000000000431c RDI: 00000000000007bc
RBP: 000000000027e044 R08: ffff88003adb5b7e R09: ffff88003c44cb28
R10: ffff88003adb5b7e R11: ffff88003adb5ac8 R12: 0000000000000000
R13: 0000000000000100 R14: ffffffff8037db03 R15: ffff88003adb5ac8
FS:  00007f398e1eb6f0(0000) GS:ffff88003e1e93c0(0000) 
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000061ee78 CR3: 000000003b125000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Call Trace:
Inexact backtrace:

 [<ffffffffa01e66ac>] ? e1000_read_flash_data_ich8lan+0xab/0x104 [e1000e]
 [<ffffffffa01e715e>] ? e1000_read_nvm_ich8lan+0xcc/0x10a [e1000e]
 [<ffffffff8020c394>] ? mcount_call+0x5/0x31
 [<ffffffffa01e9c46>] ? e1000e_validate_nvm_checksum_generic+0x34/0x62 
[e1000e]
 [<ffffffffa01e63b4>] ? e1000_validate_nvm_checksum_ich8lan+0x6c/0x73 
[e1000e]
 [<ffffffffa01f661e>] ? e1000_probe+0x5a4/0xb7e [e1000e]
 [<ffffffffa01f661e>] ? e1000_probe+0x5a4/0xb7e [e1000e]
 [<ffffffff8023e290>] ? set_cpus_allowed_ptr+0x119/0x126
 [<ffffffff803784a8>] ? kobject_get+0x1a/0x22
 [<ffffffff8038b889>] ? pci_device_probe+0xc9/0x120
 [<ffffffff80401958>] ? driver_probe_device+0xc5/0x173
 [<ffffffff80401a5a>] ? __driver_attach+0x54/0x7e
 [<ffffffff80401a06>] ? __driver_attach+0x0/0x7e
 [<ffffffff80401186>] ? bus_for_each_dev+0x54/0x8e
 [<ffffffff80401794>] ? driver_attach+0x21/0x23
 [<ffffffff80400a74>] ? bus_add_driver+0xbc/0x206
 [<ffffffff80401c6c>] ? driver_register+0xad/0x12d
 [<ffffffff8038bb5e>] ? __pci_register_driver+0x6b/0xa5
 [<ffffffffa0177000>] ? e1000_init_module+0x0/0x75 [e1000e]
 [<ffffffffa0177059>] ? e1000_init_module+0x59/0x75 [e1000e]
 [<ffffffff8020904c>] ? _stext+0x4c/0x151
 [<ffffffff80268169>] ? sys_init_module+0xae/0x1cc
 [<ffffffff8020c57a>] ? system_call_fastpath+0x16/0x1b                                                                                     

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-29 15:52       ` Jiri Kosina
@ 2008-09-29 16:20         ` Jiri Kosina
  2008-09-29 16:24           ` Brandeburg, Jesse
  0 siblings, 1 reply; 39+ messages in thread
From: Jiri Kosina @ 2008-09-29 16:20 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, agospoda, Ronciak, John, Allan, Bruce W, Graham, David,
	kkiel, Thomas Gleixner, chris.jones, arjan

On Mon, 29 Sep 2008, Jiri Kosina wrote:

> > in case your mailer hoses something apply in this order:
> > # This series applies on GIT commit 011fcfcb75311c7368f13170b9e68adcf146a557
> > 01-e-mem.patch
> > 02-e_flash.patch
> > 03-e1000e-release-lock-in-reset.patch
> > 04-e1000e-dont-sleep.patch
> > 05-e1000e-no-deeplocks.patch
> > 06-e1000e-drop-stats-lock.patch
> > 07-subject-e1000e-debug-patch.patch
> > 08-e1000e-version.patch
> > 09-e1000e-allow-bad-checksum.patch
> > 10-e1000e-dump-eeprom-to-dmesg.txt
> When using this patchset (plus patch that adds check for address range in 
> pci_mmap_resource() by Jesse Barnes), the machine (that already has 
> corrupted (but not completely erased)) hangs after dumping eeprom 
> contents:
> 0000:00:19.0: 0000:00:19.0: The NVM Checksum Is Not Valid
> /*********************/
> Current EEPROM Checksum : 0x2259
> Calculated              : 0xa259
> Offset    Values
> ========  ======
> 00000000: 00 15 58 c6 4a ff 00 08 ff ff 30 00 ff ff ff ff
> 00000010: ff ff ff ff c7 10 b9 20 aa 17 49 10 86 80 00 00
> 00000020: 01 0d 00 00 00 00 05 16 20 50 00 38 00 00 8b 0d
> 00000030: 02 06 c1 01 03 08 00 00 00 00 00 00 00 00 00 00
> 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000060: 00 01 00 40 28 12 07 40 ff ff ff ff ff ff ff ff
> 00000070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 59 22
> /*********************/
> after this, alt-sysrq-p indicates that it's somehow running in the loops 
> around r1000_read_nvm_ich8lan and e1000_release_swflag_uch8lan. Below 
> there are several subsequent alt-sysrq-p outputs on this frozen system

And I believe that this is because of this code in 
09-e1000e-allow-bad-checksum.patch:

        for (i = 0;; i++) {
                if (e1000_validate_nvm_checksum(hw) >= 0) {
                        /* copy the MAC address out of the NVM */
                        if (e1000e_read_mac_addr(&adapter->hw))
                                e_err("NVM Read Error reading MAC address\n");
                        break;
                }
                if (i == 2) {
                        e_err("The NVM Checksum Is Not Valid\n");
                        e1000e_dump_eeprom(adapter);
                        /*
                         * set MAC address to all zeroes to invalidate and
                         * temporary disable this device for the user. This
                         * blocks regular traffic while still permitting
                         * ethtool ioctls from reaching the hardware as well as
                         * allowing the user to run the interface after
                         * manually setting a hw addr using
                         * `ip link set address`
                         */
                        memset(hw->mac.addr, 0, netdev->addr_len);
                }
        }

We are missing 'break;' after the memset, and that is where the hanging 
machine comes from (the loop keeps spinning forever), right? I will verify 
this right away.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: e1000e NVM corruption issue status
  2008-09-29 16:20         ` Jiri Kosina
@ 2008-09-29 16:24           ` Brandeburg, Jesse
  2008-09-29 17:18             ` Jiri Kosina
  0 siblings, 1 reply; 39+ messages in thread
From: Brandeburg, Jesse @ 2008-09-29 16:24 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: LKML, agospoda, Ronciak, John, Allan, Bruce W, Graham, David,
	Karsten Keil, Thomas Gleixner, chris.jones, arjan

fixed Karsten's email, apologies for the bounces.

Jiri Kosina wrote:
> We are missing 'break;' after the memset, and that is where the
> hanging machine comes from (the loop keeps spinning forever), right?
> I will verify this right away.

Yep, I think you're right.  I'm almost done prepping our current patch
series, I'll include that fix.  Please let me know if your test shows it
fixes it.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: e1000e NVM corruption issue status
  2008-09-29 16:24           ` Brandeburg, Jesse
@ 2008-09-29 17:18             ` Jiri Kosina
  2008-09-29 17:36               ` Jiri Kosina
  0 siblings, 1 reply; 39+ messages in thread
From: Jiri Kosina @ 2008-09-29 17:18 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, agospoda, Ronciak, John, Allan, Bruce W, Graham, David,
	Karsten Keil, Thomas Gleixner, chris.jones, arjan

On Mon, 29 Sep 2008, Brandeburg, Jesse wrote:

> > We are missing 'break;' after the memset, and that is where the
> > hanging machine comes from (the loop keeps spinning forever), right?
> > I will verify this right away.
> Yep, I think you're right.  I'm almost done prepping our current patch
> series, I'll include that fix.  Please let me know if your test shows it
> fixes it.

Yes, it fixed the hang. It crashed later though

BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
IP: [<ffffffffa045211b>] inet6_net_init+0x98/0xf2 [ipv6]
PGD 39d15067 PUD 39d14067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: 
/sys/devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/local_cpus

which is probably related I guess.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: e1000e NVM corruption issue status
  2008-09-29 17:18             ` Jiri Kosina
@ 2008-09-29 17:36               ` Jiri Kosina
  2008-09-29 22:43                 ` Jiri Kosina
  0 siblings, 1 reply; 39+ messages in thread
From: Jiri Kosina @ 2008-09-29 17:36 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, agospoda, Ronciak, John, Allan, Bruce W, Graham, David,
	Karsten Keil, Thomas Gleixner, chris.jones, arjan

On Mon, 29 Sep 2008, Jiri Kosina wrote:

> > > We are missing 'break;' after the memset, and that is where the
> > > hanging machine comes from (the loop keeps spinning forever), right?
> > > I will verify this right away.
> > Yep, I think you're right.  I'm almost done prepping our current patch
> > series, I'll include that fix.  Please let me know if your test shows it
> > fixes it.
> Yes, it fixed the hang. It crashed later though
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
> IP: [<ffffffffa045211b>] inet6_net_init+0x98/0xf2 [ipv6]
> PGD 39d15067 PUD 39d14067 PMD 0
> Oops: 0000 [1] SMP
> last sysfs file: 
> /sys/devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/local_cpus

Hmm, this happens even if I put the e1000e module completely out of way 
(and it happens when userspace is starting postfix), so it might be a 
completely separate issue.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: e1000e NVM corruption issue status
  2008-09-29 17:36               ` Jiri Kosina
@ 2008-09-29 22:43                 ` Jiri Kosina
  0 siblings, 0 replies; 39+ messages in thread
From: Jiri Kosina @ 2008-09-29 22:43 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, agospoda, Ronciak, John, Allan, Bruce W, Graham, David,
	Karsten Keil, Thomas Gleixner, chris.jones, arjan

On Mon, 29 Sep 2008, Jiri Kosina wrote:

> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
> > IP: [<ffffffffa045211b>] inet6_net_init+0x98/0xf2 [ipv6]
> > PGD 39d15067 PUD 39d14067 PMD 0
> > Oops: 0000 [1] SMP
> > last sysfs file: 
> Hmm, this happens even if I put the e1000e module completely out of way 
> (and it happens when userspace is starting postfix), so it might be a 
> completely separate issue.

Verified, this is something completely separate, started triggering at the 
very same time I have fixed the 09-e1000e-allow-bad-checksum.patch only by 
coincidence. Please disregard this.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-09-26  1:50 ` e1000e NVM corruption issue status Brandeburg, Jesse
                     ` (3 preceding siblings ...)
  2008-09-26  7:19   ` Karsten Keil
@ 2008-10-18 19:13   ` James Courtier-Dutton
  2008-10-18 22:49     ` Jiri Kosina
  4 siblings, 1 reply; 39+ messages in thread
From: James Courtier-Dutton @ 2008-10-18 19:13 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: LKML, Jiri Kosina, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

Brandeburg, Jesse wrote:
> 
> Many of the reports seem to be related in time to a graphics crash, no one 
> has been able to give us more detail about how to reproduce.  We NEED HELP 
> reproducing this.  Steps, hints, anything.  We are trying rebooting, 
> suspending, opensuse, fedora, ubuntu, and several hardware platforms, etc.
> 

I would suspect the ipv6 module instead of the graphics. It has a number
of bugs in it. I am just mentioning it in case it helps.

Kind Regards

James




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: e1000e NVM corruption issue status
  2008-10-18 19:13   ` James Courtier-Dutton
@ 2008-10-18 22:49     ` Jiri Kosina
  0 siblings, 0 replies; 39+ messages in thread
From: Jiri Kosina @ 2008-10-18 22:49 UTC (permalink / raw)
  To: James Courtier-Dutton
  Cc: Brandeburg, Jesse, LKML, agospoda, Ronciak, John, Allan, Bruce W,
	Graham, David, kkiel, tglx, chris.jones, arjan

On Sat, 18 Oct 2008, James Courtier-Dutton wrote:

> > Many of the reports seem to be related in time to a graphics crash, no one 
> > has been able to give us more detail about how to reproduce.  We NEED HELP 
> > reproducing this.  Steps, hints, anything.  We are trying rebooting, 
> > suspending, opensuse, fedora, ubuntu, and several hardware platforms, etc.
> I would suspect the ipv6 module instead of the graphics. It has a number 
> of bugs in it. I am just mentioning it in case it helps.

The culprit of the bug has already been identified. It has been 
CONFIG_DYNAMIC_FTRACE.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2008-10-18 22:49 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <987CEB09A2567F4A963E1E226364E2D33A685B4B@orsmsx418.amr.corp.intel.com>
2008-09-26  1:50 ` e1000e NVM corruption issue status Brandeburg, Jesse
2008-09-26  1:58   ` Chris Snook
2008-09-26  2:04     ` Brandeburg, Jesse
2008-09-26  2:01   ` Brandeburg, Jesse
2008-09-26  2:09     ` Brandeburg, Jesse
2008-09-26  7:12       ` Ingo Molnar
2008-09-26  2:09     ` Brandeburg, Jesse
2008-09-26  2:10     ` Brandeburg, Jesse
2008-09-26  2:10     ` Brandeburg, Jesse
2008-09-26  2:10     ` Brandeburg, Jesse
2008-09-26  2:11     ` Brandeburg, Jesse
2008-09-26  2:11     ` Brandeburg, Jesse
2008-09-26  2:12     ` Brandeburg, Jesse
2008-09-26  2:12     ` Brandeburg, Jesse
2008-09-26  2:13     ` Brandeburg, Jesse
2008-09-26  2:13     ` Brandeburg, Jesse
2008-09-29 15:52       ` Jiri Kosina
2008-09-29 16:20         ` Jiri Kosina
2008-09-29 16:24           ` Brandeburg, Jesse
2008-09-29 17:18             ` Jiri Kosina
2008-09-29 17:36               ` Jiri Kosina
2008-09-29 22:43                 ` Jiri Kosina
2008-09-26  6:13     ` Jiri Kosina
2008-09-26 11:49       ` Arjan van de Ven
2008-09-26 17:52         ` Jesse Barnes
2008-09-26 18:23           ` Jesse Barnes
2008-09-26 18:39             ` Jesse Barnes
2008-09-26 18:43               ` Jesse Barnes
2008-09-26 18:53             ` Tim Gardner
2008-09-26 22:04               ` Krzysztof Halasa
2008-09-26 22:23                 ` Brandeburg, Jesse
2008-09-27 18:45                   ` Krzysztof Halasa
2008-09-27  0:05               ` Brandeburg, Jesse
2008-09-27  4:20                 ` Tim Gardner
2008-09-26 14:23     ` Karsten Keil
2008-09-26  5:44   ` Jesse Brandeburg
2008-09-26  7:19   ` Karsten Keil
2008-10-18 19:13   ` James Courtier-Dutton
2008-10-18 22:49     ` Jiri Kosina

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).