LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC] mm: optimise generic_file_read_iter
@ 2021-08-06 11:42 Pavel Begunkov
  2021-08-06 13:48 ` Al Viro
  2021-08-06 23:49 ` Dave Chinner
  0 siblings, 2 replies; 5+ messages in thread
From: Pavel Begunkov @ 2021-08-06 11:42 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Alexander Viro, linux-fsdevel, Jens Axboe, linux-kernel

Unless direct I/O path of generic_file_read_iter() ended up with an
error or a short read, it doesn't use inode. So, load inode and size
later, only when they're needed. This cuts two memory reads and also
imrpoves code generation, e.g. loads from stack.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---

NOTE: as a side effect, it reads inode->i_size after ->direct_IO(), and
I'm not sure whether that's valid, so would be great to get feedback
from someone who knows better.

 mm/filemap.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index d1458ecf2f51..0030c454ec35 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2658,10 +2658,8 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	if (iocb->ki_flags & IOCB_DIRECT) {
 		struct file *file = iocb->ki_filp;
 		struct address_space *mapping = file->f_mapping;
-		struct inode *inode = mapping->host;
-		loff_t size;
+		struct inode *inode;
 
-		size = i_size_read(inode);
 		if (iocb->ki_flags & IOCB_NOWAIT) {
 			if (filemap_range_needs_writeback(mapping, iocb->ki_pos,
 						iocb->ki_pos + count - 1))
@@ -2693,8 +2691,10 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 		 * the rest of the read.  Buffered reads will not work for
 		 * DAX files, so don't bother trying.
 		 */
-		if (retval < 0 || !count || iocb->ki_pos >= size ||
-		    IS_DAX(inode))
+		if (retval < 0 || !count)
+			return retval;
+		inode = mapping->host;
+		if (iocb->ki_pos >= i_size_read(inode) || IS_DAX(inode))
 			return retval;
 	}
 
-- 
2.32.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] mm: optimise generic_file_read_iter
  2021-08-06 11:42 [RFC] mm: optimise generic_file_read_iter Pavel Begunkov
@ 2021-08-06 13:48 ` Al Viro
  2021-08-06 17:18   ` Jens Axboe
  2021-08-07 10:30   ` Pavel Begunkov
  2021-08-06 23:49 ` Dave Chinner
  1 sibling, 2 replies; 5+ messages in thread
From: Al Viro @ 2021-08-06 13:48 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Andrew Morton, linux-mm, linux-fsdevel, Jens Axboe, linux-kernel

On Fri, Aug 06, 2021 at 12:42:43PM +0100, Pavel Begunkov wrote:
> Unless direct I/O path of generic_file_read_iter() ended up with an
> error or a short read, it doesn't use inode. So, load inode and size
> later, only when they're needed. This cuts two memory reads and also
> imrpoves code generation, e.g. loads from stack.

... and the same question here.

> NOTE: as a side effect, it reads inode->i_size after ->direct_IO(), and
> I'm not sure whether that's valid, so would be great to get feedback
> from someone who knows better.

Ought to be safe, I think, but again, how much effect have you observed
from the patch?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] mm: optimise generic_file_read_iter
  2021-08-06 13:48 ` Al Viro
@ 2021-08-06 17:18   ` Jens Axboe
  2021-08-07 10:30   ` Pavel Begunkov
  1 sibling, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2021-08-06 17:18 UTC (permalink / raw)
  To: Al Viro, Pavel Begunkov
  Cc: Andrew Morton, linux-mm, linux-fsdevel, linux-kernel

On 8/6/21 7:48 AM, Al Viro wrote:
> On Fri, Aug 06, 2021 at 12:42:43PM +0100, Pavel Begunkov wrote:
>> Unless direct I/O path of generic_file_read_iter() ended up with an
>> error or a short read, it doesn't use inode. So, load inode and size
>> later, only when they're needed. This cuts two memory reads and also
>> imrpoves code generation, e.g. loads from stack.
> 
> ... and the same question here.
> 
>> NOTE: as a side effect, it reads inode->i_size after ->direct_IO(), and
>> I'm not sure whether that's valid, so would be great to get feedback
>> from someone who knows better.
> 
> Ought to be safe, I think, but again, how much effect have you observed
> from the patch?

Ran a quick test here, doing polled IO (~3.3M IOPS) and we reduce the
overhead of generic_file_read_iter() from 1.5% of the runtime to 1.2%.
Noticeable. Will improve once we stop digging into the inode on the
io_uring side.

Anyway, just one data point, perhaps Pavel has some too.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] mm: optimise generic_file_read_iter
  2021-08-06 11:42 [RFC] mm: optimise generic_file_read_iter Pavel Begunkov
  2021-08-06 13:48 ` Al Viro
@ 2021-08-06 23:49 ` Dave Chinner
  1 sibling, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2021-08-06 23:49 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Andrew Morton, linux-mm, Alexander Viro, linux-fsdevel,
	Jens Axboe, linux-kernel

On Fri, Aug 06, 2021 at 12:42:43PM +0100, Pavel Begunkov wrote:
> Unless direct I/O path of generic_file_read_iter() ended up with an
> error or a short read, it doesn't use inode. So, load inode and size
> later, only when they're needed. This cuts two memory reads and also
> imrpoves code generation, e.g. loads from stack.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
> 
> NOTE: as a side effect, it reads inode->i_size after ->direct_IO(), and
> I'm not sure whether that's valid, so would be great to get feedback
> from someone who knows better.

I can see that it changes behaviour in a very subtle way. It depends
on what each individual filesystem does with direct IO as to whether
this may introduce potential data coherency/corruption issues, so I
can't say that it's a safe change. It doesn't affect XFS, because
XFS doesn't do direct IO through generic_file_read_iter().

Fundamentally, the issue is that ->direct_IO() can race with inode
size extensions due to write IO completions while the read IO is in
flight.

>  mm/filemap.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index d1458ecf2f51..0030c454ec35 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2658,10 +2658,8 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>  	if (iocb->ki_flags & IOCB_DIRECT) {
>  		struct file *file = iocb->ki_filp;
>  		struct address_space *mapping = file->f_mapping;
> -		struct inode *inode = mapping->host;
> -		loff_t size;
> +		struct inode *inode;
>  
> -		size = i_size_read(inode);
>  		if (iocb->ki_flags & IOCB_NOWAIT) {
>  			if (filemap_range_needs_writeback(mapping, iocb->ki_pos,
>  						iocb->ki_pos + count - 1))
> @@ -2693,8 +2691,10 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>  		 * the rest of the read.  Buffered reads will not work for
>  		 * DAX files, so don't bother trying.
>  		 */
> -		if (retval < 0 || !count || iocb->ki_pos >= size ||
> -		    IS_DAX(inode))

Hence this check in the current code is determining if the IO file
offset *after* the IO completed is at or beyond the EOF *before the
IO was started*. i.e. it always detects a short read, because the
EOF can only ascend while a DIO is in progress - truncation cannot
run concurrently with DIO reads. Hence if we get less bytes read
than we ask for, and we are beyond the EOF we sampled at the start
of the IO, we know for certain we got a short read and we drop out
without going through the buffered read path.

> +		if (retval < 0 || !count)
> +			return retval;
> +		inode = mapping->host;
> +		if (iocb->ki_pos >= i_size_read(inode) || IS_DAX(inode))
>  			return retval;

This changes the check to read the inode size after the read IO
completed. This means the IO could have raced with size extensions
from other concurrent DIO writes (or even racing buffered IO
writeback), so despite getting less bytes than we asked for, we
won't detect it as a short DIO read. Hence we now fall through to the
buffered read path.

So at minimum, this is a _very subtle_ change of behaviour in the
direct IO code, resulting in short reads at EOF now sometimes
falling through to the buffered IO path where they never did before.
It may not be an issue but per-filesystem audits will be needed to
determine that....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] mm: optimise generic_file_read_iter
  2021-08-06 13:48 ` Al Viro
  2021-08-06 17:18   ` Jens Axboe
@ 2021-08-07 10:30   ` Pavel Begunkov
  1 sibling, 0 replies; 5+ messages in thread
From: Pavel Begunkov @ 2021-08-07 10:30 UTC (permalink / raw)
  To: Al Viro; +Cc: Andrew Morton, linux-mm, linux-fsdevel, Jens Axboe, linux-kernel

On 8/6/21 2:48 PM, Al Viro wrote:
> On Fri, Aug 06, 2021 at 12:42:43PM +0100, Pavel Begunkov wrote:
>> Unless direct I/O path of generic_file_read_iter() ended up with an
>> error or a short read, it doesn't use inode. So, load inode and size
>> later, only when they're needed. This cuts two memory reads and also
>> imrpoves code generation, e.g. loads from stack.
> 
> ... and the same question here.
> 
>> NOTE: as a side effect, it reads inode->i_size after ->direct_IO(), and
>> I'm not sure whether that's valid, so would be great to get feedback
>> from someone who knows better.
> 
> Ought to be safe, I think, but again, how much effect have you observed
> from the patch?

Answering for both patches -- I haven't benchmarked it and don't expect
to find anything just from this one, considering variance between runs.
I took a loot at the assembly (gcc 11.1), it removes 2 reads to get
i_size, write+read that i_size from stack, because it stashed it on
the stack.

For example, we've squeezed several percents of throughput before on
the io_uring side just by cutting sheer number of not too expensive
individually instructions. IMHO, it's easier to do when you spotted
something by the way, than rediscovering the same during a performance
safari.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-07 10:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-06 11:42 [RFC] mm: optimise generic_file_read_iter Pavel Begunkov
2021-08-06 13:48 ` Al Viro
2021-08-06 17:18   ` Jens Axboe
2021-08-07 10:30   ` Pavel Begunkov
2021-08-06 23:49 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).