LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Ricardo Correia <rcorreia@wizy.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: How to flush the disk write cache from userspace
Date: Tue, 16 Jan 2007 11:38:54 +1100	[thread overview]
Message-ID: <20070116003854.GE4067@kernel.dk> (raw)
In-Reply-To: <200701140405.33748.rcorreia@wizy.org>

On Sun, Jan 14 2007, Ricardo Correia wrote:
> Hi, (please CC: to my email address, I'm not subscribed)
> 
> Quick question: how can I flush the disk write cache from userspace?
> 
> Long question:
> 
> I'm porting the Solaris ZFS filesystem to the FUSE/Linux filesystem
> framework.  This is a copy-on-write, transactional filesystem and so
> it needs to ensure correct ordering of writes when transactions are
> written to disk.
> 
> At the moment, when transactions end, I'm using a fsync() on the block
> device followed by a ioctl(BLKFLSBUF).
> 
> This is because, according to the fsync manpage, even after fsync()
> returns, data might still be in the disk write cache, so fsync by
> itself doesn't guarantee data safety on power failure.

Depends. Only if the file system does the right thing here, iirc only
reiserfs with barriers enabled issue a real disk flush for fsync. So you
can't rely on it in general.

> I was looking for something like the Solaris
> ioctl(DKIOCFLUSHWRITECACHE), which does exactly what I need.
> 
> The most similar thing I could find was ioctl(BLKFLSBUF), however a
> search for BLKFLSBUF on the Linux 2.6.15 source doesn't seem to return
> anything related to IDE or SCSI disks.
> 
> Can I trust ioctl(BLKFLSBUF) to flush disks' write caches (for disks
> that follow the specs)?

BLKFLSBUF doesn't flush the disk cache either, it just flushes
every dirty page in the block device address space. It would not be very
hard to do, basically we have most of the support code in place for this
for IO barriers. Basically it would be something like:

blockdev_cache_flush(bdev)
{
        request_queue_t *q = bdev_get_queue(bdev);
        struct request *rq = blk_get_request(q, WRITE, GFP_WHATEVER);
        int ret;

        ret = blk_execute_rq(q, bdev->bd_disk, rq, 0);
        blk_put_request(rq);
        return ret;
}

Somewhat simplified of course, but it should get the point across.
Putting that in fs/buffer.c:sync_blockdev() would make BLKFLSBUF work.

As always with these things, the devil is in the details. It requires
the device to support a ->prepare_flush() queue hook, and not all
devices do that. It will work for IDE/SATA/SCSI, though. In some devices
you don't want/need to do a real disk flush, it depends on the write
cache settings, battery backing, etc.

-- 
Jens Axboe


  reply	other threads:[~2007-01-16  0:38 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-14  4:05 Ricardo Correia
2007-01-16  0:38 ` Jens Axboe [this message]
2007-01-18  1:15   ` Ricardo Correia
     [not found] <fa.y+HJNAxqDqX5AHUxcmThAo20Ivo@ifi.uio.no>
     [not found] ` <fa.xbdrjhFpvWMJeTroG2DpPE4wd+M@ifi.uio.no>
     [not found]   ` <fa.lqQRZqIqMX2chyIAM888fc1jCuY@ifi.uio.no>
2007-01-19  0:35     ` Robert Hancock
2007-01-21  3:50       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070116003854.GE4067@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rcorreia@wizy.org \
    --subject='Re: How to flush the disk write cache from userspace' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).