LKML Archive on
help / color / mirror / Atom feed
From: Jens Axboe <>
To: Tejun Heo <>
Cc: Robert Hancock <>,
	linux-kernel <>,,,, Jeff Garzik <>,
	Alan Cox <>, Mark Lord <>
Subject: Re: libata FUA revisited
Date: Thu, 15 Feb 2007 19:00:24 +0100	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Tue, Feb 13 2007, Tejun Heo wrote:
> >>So, actually, I was thinking about *always* using the non-NCQ FUA 
> >>opcode.  As currently implemented, FUA request is always issued by 
> >>itself, so NCQ doesn't make any difference there.  So, I think it 
> >>would be better to turn on FUA on driver-by-driver basis whether the 
> >>controller supports NCQ or not.
> >
> >Unfortunately not all drives that support NCQ support the non-NCQ FUA 
> >commands (my Seagates are like this).
> And I'm a bit scared to set FUA bit on such drives and trust that it 
> will actually do FUA, so our opinions aren't too far away from each 
> other.  :-)
> >There's definitely a potential advantage to FUA with NCQ - if you have 
> >non-synchronous accesses going on concurrently with synchronous ones, if 
> >you have to use non-NCQ FUA or flush cache commands, you have to wait 
> >for all the IOs of both types to drain out before you can issue the 
> >flush (since those can't be overlapped with the NCQ read/writes). And if 
> >you can only use flush cache, then you're forcing all the writes to be 
> >flushed including the non-synchronous ones you didn't care about. 
> >Whether or not the block layer currently exploits this I don't know, but 
> >it definitely could.
> The current barrier implementation uses the following sequences for 
> no-FUA and FUA cases.
> 1. w/o FUA
> normal operation -> barrier issued -> drain IO -> flush -> barrier 
> written -> flush -> normal operation resumes
> 2. w/ FUA
> normal operation -> barrier issued -> drain IO -> flush -> barrier 
> written / FUA -> normal operation resumes
> So, the FUA write is issued by itself.  This isn't really efficient and 
> frequent barriers impact the performance badly.  If we can change that 
> NCQ FUA will be certainly beneficial.

But we can't really change that, since you need the cache flushed before
issuing the FUA write. I've been advocating for an ordered bit for
years, so that we could just do:


normal operation -> barrier issued -> write barrier FUA+ORDERED
 -> normal operation resumes

So we don't have to serialize everything both at the block and device
level. I would have made FUA imply this already, but apparently it's not
what MS wanted FUA for, so... The current implementations take the FUA
bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
almost certainly going to jump ahead of already queued writes. Which we
of course really do not.

> >>Well, I might be being too paranoid but silent FUA failure would be 
> >>really hard to diagnose if that ever happens (and I'm fairly certain 
> >>that it will on some firmwares).
> >
> >Well, there are also probably drives that ignore flush cache commands or 
> > fail to do other things that they should. There's only so far we can go 
> >in coping if the firmware authors are being retarded. If any drive is 
> >broken like that we should likely just blacklist NCQ on it entirely as 
> >obviously little thought or testing went into the implementation..
> FLUSH has been around quite long time now and most drives don't have 
> problem with that.  FUA on ATA is still quite new and libata will be the 
> first major user of it if we enable it by default.  It just seems too 
> easy to ignore that bit and successfully complete the write - there 
> isn't any safety net as opposed to using a separate opcode.  So, I'm a 
> bit nervous.

I'm not too nervous about the FUA write commands, I hope we can safely
assume that if you set the FUA supported bit in the id AND the write fua
command doesn't get aborted, that FUA must work. Anything else would
just be an immensely stupid implementation. NCQ+FUA is more tricky, I
agree that it being just a command bit does make it more likely that it
could be ignored. And that is indeed a danger. Given state of NCQ in
early firmware drives, I would not at all be surprised if the drive
vendors screwed that up too.

But, since we don't have the ordered bit for NCQ/FUA anyway, we do need
to drain the drive queue before issuing the WRITE/FUA. And at that point
we may as well not use the NCQ command, just go for the regular non-NCQ
FUA write. I think that should be safe.

Jens Axboe

  reply	other threads:[~2007-02-15 18:01 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <fa.S80SRyQbD/>
2007-02-12  5:47 ` libata FUA revisited Robert Hancock
     [not found] ` <fa.Q/>
2007-02-13  0:23   ` Robert Hancock
2007-02-13 15:20     ` Tejun Heo
2007-02-14  0:07       ` Robert Hancock
2007-02-14  0:50         ` Tejun Heo
2007-02-15 18:00           ` Jens Axboe [this message]
2007-02-19 19:46             ` Robert Hancock
2007-02-21  8:37               ` Tejun Heo
2007-02-21  8:46                 ` Jens Axboe
2007-02-21  8:57                   ` Tejun Heo
2007-02-21  9:01                     ` Jens Axboe
2007-02-22 22:44                     ` Ric Wheeler
2007-02-22 22:40                   ` Ric Wheeler
2007-02-21 14:06                 ` Robert Hancock
2007-02-22 22:34                 ` Ric Wheeler
2007-02-23  0:04                   ` Robert Hancock
2007-02-21  8:44               ` Jens Axboe
2007-02-12  3:25 Robert Hancock
2007-02-12  8:31 ` Tejun Heo
2007-02-16 18:14   ` Jeff Garzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).