LKML Archive on
help / color / mirror / Atom feed
From: Hidetoshi Seto <>
To: Linux Kernel mailing list <>
Subject: [RFC] How drivers notice a HW error?
Date: Thu, 27 Nov 2003 17:28:02 +0900	[thread overview]
Message-ID: <023401c3b4c0$5fb40660$a8647c0a@seto> (raw)

Hi all,

This is a request for comments, especially comments from driver developers.

On some platform, for example IA64, the chipset detects an error caused by
driver's operation such as I/O read, and reports it to kernel. Linux kernel
analyzes the error and decides to kill the driver or reboot at worst.
I want to convey the error information to the offending driver, and want to
enable the driver to recover the failed operation.

So, just a plan, I think about a readb_check function that has checking ability
enable it to return error value if error is occurred on read. Drivers could use
readb_check instead of usual readb, and could diagnosis whether a retry be
required or not, by the return value of readb_check.

To realize this, I consider following two images:

+ readb_check on driver (with Notifier)
    - Hardware error handler (for example in IA64, MCA handler) has a Notifier
      as hook point.
    - Driver may register a hook function to the Notifier.
    - Notifier calls over registered functions when error is occurred.
    - Called hook function checks address of error, and if the error seems
      to be concerned with the parent driver, ups internal error flag and
      stops Notifier by returning OK.
    - Hardware error handler regards state of Notifier, and decides the system
      to resume or not.
    - Restarted driver may refer the error flag after read, and may retry the
      read if flag is up.
    - Some interfaces such as register hooks would be required.
    - Coding a hook function would be a bother of developers.

+ readb_check on kernel
    - Kernel has readb_check function.
    - Drivers may use readb_check instead of usual readb.
    - Hardware error handler checks address of error, and if it occurs in
      readb_check, changes return value of readb_check and resumes
      interrupted context.
    - Driver may refer the return value to notice an error in last read
    - Overhead would be involved. (Possibly, it could say negligible since
      I/O reads are already horribly slow.)

IMO, this is a general-purpose function that should be available on many
platforms. I also hear that Solaris has some similar implementations like this.

If you have any comment about this feature or any idea different from this,
please tell me.

Best regards,


H.Seto <>

             reply	other threads:[~2003-11-27  8:30 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-27  8:28 Hidetoshi Seto [this message]
     [not found] <>
2003-11-27 11:37 ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='023401c3b4c0$5fb40660$a8647c0a@seto' \ \ \
    --subject='Re: [RFC] How drivers notice a HW error?' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).