LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Andreas Dilger <adilger@dilger.ca>
To: Eric Biggers <ebiggers3@gmail.com>
Cc: Steve French <smfrench@gmail.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	samba-technical <samba-technical@lists.samba.org>,
	CIFS <linux-cifs@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: copy_file_range and user space tools to do copy fastest
Date: Fri, 27 Apr 2018 23:18:41 -0600	[thread overview]
Message-ID: <736468DE-36BE-471B-B5F2-DB3D0B3E892B@dilger.ca> (raw)
In-Reply-To: <20180427234126.GA213261@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2958 bytes --]

On Apr 27, 2018, at 5:41 PM, Eric Biggers <ebiggers3@gmail.com> wrote:
> 
> On Fri, Apr 27, 2018 at 01:45:40PM -0600, Andreas Dilger wrote:
>> On Apr 27, 2018, at 12:25 PM, Steve French <smfrench@gmail.com> wrote:
>>> 
>>> Are there any user space tools (other than our test tools and xfs_io
>>> etc.) that support copy_file_range?  Looks like at least cp and rsync
>>> and dd don't.  That syscall which now has been around a couple years,
>>> and was reminded about at the LSF/MM summit a few days ago, presumably
>>> is the 'best' way to copy a file fast since it tries all the
>>> mechanisms (reflink etc.) in order.
>>> 
>>> Since copy_file_range syscall can be 100x or more faster for network
>>> file systems than the alternative, was surprised when I noticed that
>>> cp and rsync didn't support it.  It doesn't look like rsync even
>>> supports reflink either(although presumably if you call
>>> copy_file_range you don't have to worry about that), and reads/writes
>>> are 8K. See copy_file() in rsync/util.c
>>> 
>>> In the cp command it looks like it can call the FICLONE IOCTL (see
>>> clone_file() in coreutils/src/copy.c) but doesn't call the expected
>>> "copy_file_range" syscall.
>>> 
>>> In the dd command it doesn't call either - see dd_copy in corutils/src/dd.c
>>> 
>>> Since it can be 100x or more faster in some cases to call
>>> copy_file_range than do reads/writes back and forth to do a copy
>>> (especially if network or clustered backend or cloud), what tools are
>>> the best to recommend?
>>> 
>>> Would rsync or cp be likely to take patches to call the standard
>>> "copy_file_range" syscall
>>> (http://man7.org/linux/man-pages/man2/copy_file_range.2.html)?
>>> Presumably not if it has been two+ years ... but would be interested
>>> what copy tools to recommend to use instead.
>> 
>> I would start with submitting a patch to coreutils, if you can figure
>> out that code enough to do so (I find it quite opaque).  Since it has
>> been in the kernel for a while already, it should be acceptable to the
>> upstream coreutils maintainers to use this interface.  Doubly so if you
>> include some benchmarks with CIFS/NFS clients avoiding network overhead
>> during the copy.
>> 
> 
> For cp (coreutils), apparently there was a concern that copy_file_range()
> expands holes; see the thread at
> https://lists.gnu.org/archive/html/bug-coreutils/2016-09/msg00020.html.
> Though, I'd think it could just be used on non-holes only.  And I don't think
> the size_t type of 'len' is a problem either, since it's the copy length, not
> the file size.  You just call it multiple times if the file is larger.

I think cp is already using SEEK_HOLE/SEEK_DATA and/or FIEMAP to determine
the mapped and sparse segments of the file, so it should be practical to
use copy_file_range() in conjunction with these to copy only the allocated
parts of the file.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

  reply	other threads:[~2018-04-28  5:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-27 18:25 Steve French
2018-04-27 19:45 ` Andreas Dilger
2018-04-27 23:41   ` Eric Biggers
2018-04-28  5:18     ` Andreas Dilger [this message]
2018-04-28  5:26       ` Steve French
2018-04-28 13:59         ` Goldwyn Rodrigues

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=736468DE-36BE-471B-B5F2-DB3D0B3E892B@dilger.ca \
    --to=adilger@dilger.ca \
    --cc=ebiggers3@gmail.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=samba-technical@lists.samba.org \
    --cc=smfrench@gmail.com \
    --subject='Re: copy_file_range and user space tools to do copy fastest' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).