LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: dhowells@redhat.com, linux-fsdevel@vger.kernel.org,
	jlayton@kernel.org, Christoph Hellwig <hch@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	dchinner@redhat.com, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: Could it be made possible to offer "supplementary" data to a DIO write ?
Date: Thu, 05 Aug 2021 15:38:01 +0100	[thread overview]
Message-ID: <1186271.1628174281@warthog.procyon.org.uk> (raw)
In-Reply-To: <YQvpDP/tdkG4MMGs@casper.infradead.org>

Matthew Wilcox <willy@infradead.org> wrote:

> You can already get 400Gbit ethernet.

Sorry, but that's not likely to become relevant any time soon.  Besides, my
laptop's wifi doesn't really do that yet.

> Saving 500 bytes by sending just the 12 bytes that changed is optimising the
> wrong thing.

In one sense, at least, you're correct.  The cost of setting up an RPC to do
the write and setting up crypto is high compared to transmitting 3 bytes vs 4k
bytes.

> If you have two clients accessing the same file at byte granularity, you've
> already lost.

Doesn't stop people doing it, though.  People have sqlite, dbm, mail stores,
whatever in the homedirs from the desktop environments.  Granted, most of the
time people don't log in twice with the same homedir from two different
machines (and it doesn't - or didn't - used to work with Gnome or KDE).

> Extent based filesystems create huge extents anyway:

Okay, so it's not feasible.  That's fine.

> This has already happened when you initially wrote to the file backing
> the cache.  Updates are just going to write to the already-allocated
> blocks, unless you've done something utterly inappropriate to the
> situation like reflinked the files.

Or the file is being read random-access and we now have a block we didn't have
before that is contiguous to another block we already have.

> If you want to take leases at byte granularity, and then not writeback
> parts of a page that are outside that lease, feel free.  It shouldn't
> affect how you track dirtiness or how you writethrough the page cache
> to the disk cache.

Indeed.  Handling writes to the local disk cache is different from handling
writes to the server(s).  The cache has a larger block size but I don't have
to worry about third-party conflicts on it, whereas the server can be taken as
having no minimum block size, but my write can clash with someone else's.

Generally, I prefer to write back the minimum I can get away with (as does the
Linux NFS client AFAICT).

However, if everyone agrees that we should only ever write back a multiple of
a certain block size, even to network filesystems, what block size should that
be?  Note that PAGE_SIZE varies across arches and folios are going to
exacerbate this.  What I don't want to happen is that you read from a file, it
creates, say, a 4M (or larger) folio; you change three bytes and then you're
forced to write back the entire 4M folio.

Note that when content crypto or compression is employed, some multiple of the
size of the encrypted/compressed blocks would be a requirement.

David


  parent reply	other threads:[~2021-08-05 14:38 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-05 10:19 Could it be made possible to offer "supplementary" data to a DIO write ? David Howells
2021-08-05 12:37 ` Matthew Wilcox
2021-08-05 13:07 ` David Howells
2021-08-05 13:35   ` Matthew Wilcox
2021-08-05 14:38   ` David Howells [this message]
2021-08-05 15:06     ` Matthew Wilcox
2021-08-05 15:38     ` David Howells
2021-08-05 16:35     ` Canvassing for network filesystem write size vs page size David Howells
2021-08-05 17:27       ` Linus Torvalds
2021-08-05 17:43         ` Trond Myklebust
2021-08-05 22:11         ` Matthew Wilcox
2021-08-06 13:42         ` David Howells
2021-08-06 14:17           ` Matthew Wilcox
2021-08-06 15:04           ` David Howells
2021-08-05 17:52       ` Adam Borowski
2021-08-05 18:50       ` Jeff Layton
2021-08-05 23:47       ` Matthew Wilcox
2021-08-06 13:44       ` David Howells
2021-08-05 17:45     ` Could it be made possible to offer "supplementary" data to a DIO write ? Adam Borowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1186271.1628174281@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=dchinner@redhat.com \
    --cc=hch@infradead.org \
    --cc=jlayton@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).