Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC] Handle EOF in case of not aligned direct io read in kernel
@ 2018-01-17  9:28 Joseph Qi
  2018-01-17 10:09 ` Steven Whitehouse
  0 siblings, 1 reply; 7+ messages in thread
From: Joseph Qi @ 2018-01-17  9:28 UTC (permalink / raw)
  To: linux-fsdevel, viro, swhiteho; +Cc: gnehzuil, caijingxian

Hi All,

commit 9fe55eea7e4b ("Fix race when checking i_size on direct i/o read")
has removed the pos check due to the race case.

Now if I want to do direct read on a file which size is not sector
alignment, it will return EINVAL in the last round. That means I have to
handle the case by checking file size in user space, which I think is a
bit inconvenient.

I've gone through the direct io read code, and found that in
do_blockdev_direct_IO, it does try to handle the EOF case, but
unfortunately it retruns EINVAL earlier when checking alignment at the
entry of this function.

/* Once we sampled i_size check for reads beyond EOF */
dio->i_size = i_size_read(inode);
if (iov_iter_rw(iter) == READ && offset >= dio->i_size) {
	...
	retval = 0;
	goto out;
}

So I wonder if we can handle this in kernel space before alignment
check, so that it behaves the same as some old kernels, e.g. 3.10.

All comments are always welcome.

Thanks,
Joseph

My test program, which behaves different in 3.10 and 4.9.
---------------------------------------------------------
#define _GNU_SOURCE

#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <errno.h>
#include <string.h>
#include <stdio.h>

#define BUFFER_SIZE 4096
#define SECTOR_SIZE 512

int main(int argc, char *argv[])
{
	char *file = "/tmp/testfile";
	size_t len = 1024;
	ssize_t more = (ssize_t)len;
	off_t offset = 0, off = 0;
	ssize_t ret = 0;
	int fd;
	char *buf;

	ret = posix_memalign((void **)&buf, SECTOR_SIZE, BUFFER_SIZE);
	if (ret < 0) {
		printf("posix_memalign failed\n");
		return ret;
	}

	memset(buf, 0, BUFFER_SIZE);

	fd = open(file, O_DIRECT);
	if (fd < 0) {
		ret = -errno;
		goto free;
	}

	while (more) {
		ret = pread(fd, buf + off, more, offset + off);
		if (ret < 0) {
			printf("pread failed, error = %d\n", -errno);
			goto out;
		}

		printf("pread %d bytes\n", ret);

		if (ret == 0)
			break;

		more -= ret;
		off += ret;
	}

out:
	close(fd);
free:
	free(buf);
	return ret;
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Handle EOF in case of not aligned direct io read in kernel
  2018-01-17  9:28 [RFC] Handle EOF in case of not aligned direct io read in kernel Joseph Qi
@ 2018-01-17 10:09 ` Steven Whitehouse
  2018-01-17 14:25   ` Matthew Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Whitehouse @ 2018-01-17 10:09 UTC (permalink / raw)
  To: Joseph Qi, linux-fsdevel, viro; +Cc: gnehzuil, caijingxian

Hi,


On 17/01/18 09:28, Joseph Qi wrote:
> Hi All,
>
> commit 9fe55eea7e4b ("Fix race when checking i_size on direct i/o read")
> has removed the pos check due to the race case.
>
> Now if I want to do direct read on a file which size is not sector
> alignment, it will return EINVAL in the last round. That means I have to
> handle the case by checking file size in user space, which I think is a
> bit inconvenient.
What do you mean by "not sector alignment"? Are you intending to read 
files with any arbitrary size, or those with 512 byte alignment on a 
filesystem with some larger block size, or something else?

Steve.

>
> I've gone through the direct io read code, and found that in
> do_blockdev_direct_IO, it does try to handle the EOF case, but
> unfortunately it retruns EINVAL earlier when checking alignment at the
> entry of this function.
>
> /* Once we sampled i_size check for reads beyond EOF */
> dio->i_size = i_size_read(inode);
> if (iov_iter_rw(iter) == READ && offset >= dio->i_size) {
> 	...
> 	retval = 0;
> 	goto out;
> }
>
> So I wonder if we can handle this in kernel space before alignment
> check, so that it behaves the same as some old kernels, e.g. 3.10.
>
> All comments are always welcome.
>
> Thanks,
> Joseph
>
> My test program, which behaves different in 3.10 and 4.9.
> ---------------------------------------------------------
> #define _GNU_SOURCE
>
> #include <sys/types.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <sys/stat.h>
> #include <errno.h>
> #include <string.h>
> #include <stdio.h>
>
> #define BUFFER_SIZE 4096
> #define SECTOR_SIZE 512
>
> int main(int argc, char *argv[])
> {
> 	char *file = "/tmp/testfile";
> 	size_t len = 1024;
> 	ssize_t more = (ssize_t)len;
> 	off_t offset = 0, off = 0;
> 	ssize_t ret = 0;
> 	int fd;
> 	char *buf;
>
> 	ret = posix_memalign((void **)&buf, SECTOR_SIZE, BUFFER_SIZE);
> 	if (ret < 0) {
> 		printf("posix_memalign failed\n");
> 		return ret;
> 	}
>
> 	memset(buf, 0, BUFFER_SIZE);
>
> 	fd = open(file, O_DIRECT);
> 	if (fd < 0) {
> 		ret = -errno;
> 		goto free;
> 	}
>
> 	while (more) {
> 		ret = pread(fd, buf + off, more, offset + off);
> 		if (ret < 0) {
> 			printf("pread failed, error = %d\n", -errno);
> 			goto out;
> 		}
>
> 		printf("pread %d bytes\n", ret);
>
> 		if (ret == 0)
> 			break;
>
> 		more -= ret;
> 		off += ret;
> 	}
>
> out:
> 	close(fd);
> free:
> 	free(buf);
> 	return ret;
> }
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Handle EOF in case of not aligned direct io read in kernel
  2018-01-17 10:09 ` Steven Whitehouse
@ 2018-01-17 14:25   ` Matthew Wilcox
  2018-01-18  1:12     ` Joseph Qi
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2018-01-17 14:25 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: Joseph Qi, linux-fsdevel, viro, gnehzuil, caijingxian

On Wed, Jan 17, 2018 at 10:09:28AM +0000, Steven Whitehouse wrote:
> Hi,
> 
> 
> On 17/01/18 09:28, Joseph Qi wrote:
> > Hi All,
> > 
> > commit 9fe55eea7e4b ("Fix race when checking i_size on direct i/o read")
> > has removed the pos check due to the race case.
> > 
> > Now if I want to do direct read on a file which size is not sector
> > alignment, it will return EINVAL in the last round. That means I have to
> > handle the case by checking file size in user space, which I think is a
> > bit inconvenient.
> What do you mean by "not sector alignment"? Are you intending to read files
> with any arbitrary size, or those with 512 byte alignment on a filesystem
> with some larger block size, or something else?

I think he's saying that the file he's trying to read has a length which
is not a multiple of 512.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Handle EOF in case of not aligned direct io read in kernel
  2018-01-17 14:25   ` Matthew Wilcox
@ 2018-01-18  1:12     ` Joseph Qi
  2018-01-22  1:57       ` Joseph Qi
  0 siblings, 1 reply; 7+ messages in thread
From: Joseph Qi @ 2018-01-18  1:12 UTC (permalink / raw)
  To: Matthew Wilcox, Steven Whitehouse
  Cc: linux-fsdevel, viro, gnehzuil, caijingxian



On 18/1/17 22:25, Matthew Wilcox wrote:
> On Wed, Jan 17, 2018 at 10:09:28AM +0000, Steven Whitehouse wrote:
>> Hi,
>>
>>
>> On 17/01/18 09:28, Joseph Qi wrote:
>>> Hi All,
>>>
>>> commit 9fe55eea7e4b ("Fix race when checking i_size on direct i/o read")
>>> has removed the pos check due to the race case.
>>>
>>> Now if I want to do direct read on a file which size is not sector
>>> alignment, it will return EINVAL in the last round. That means I have to
>>> handle the case by checking file size in user space, which I think is a
>>> bit inconvenient.
>> What do you mean by "not sector alignment"? Are you intending to read files
>> with any arbitrary size, or those with 512 byte alignment on a filesystem
>> with some larger block size, or something else?
> I think he's saying that the file he's trying to read has a length which
> is not a multiple of 512.
Yes, suppose I am trying to read a file which size is 4000 bytes, and
the allocated buffer is 4096 bytes.
When running my test program, it returns 4000 for the first round and
-22 for the second.
While running my test program on 3.10, it returns 4000 and then 0,
which is my expected behavior.

Sorry for the brought misunderstanding.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Handle EOF in case of not aligned direct io read in kernel
  2018-01-18  1:12     ` Joseph Qi
@ 2018-01-22  1:57       ` Joseph Qi
  2018-01-25  8:47         ` Jan Kara
  0 siblings, 1 reply; 7+ messages in thread
From: Joseph Qi @ 2018-01-22  1:57 UTC (permalink / raw)
  To: Matthew Wilcox, Steven Whitehouse
  Cc: linux-fsdevel, viro, gnehzuil, caijingxian

Hi All,

Any comments on this case?
Or should we definitely leave it to be handled in user space code?

Thanks,
Joseph

On 18/1/18 09:12, Joseph Qi wrote:
> 
> 
> On 18/1/17 22:25, Matthew Wilcox wrote:
>> On Wed, Jan 17, 2018 at 10:09:28AM +0000, Steven Whitehouse wrote:
>>> Hi,
>>>
>>>
>>> On 17/01/18 09:28, Joseph Qi wrote:
>>>> Hi All,
>>>>
>>>> commit 9fe55eea7e4b ("Fix race when checking i_size on direct i/o read")
>>>> has removed the pos check due to the race case.
>>>>
>>>> Now if I want to do direct read on a file which size is not sector
>>>> alignment, it will return EINVAL in the last round. That means I have to
>>>> handle the case by checking file size in user space, which I think is a
>>>> bit inconvenient.
>>> What do you mean by "not sector alignment"? Are you intending to read files
>>> with any arbitrary size, or those with 512 byte alignment on a filesystem
>>> with some larger block size, or something else?
>> I think he's saying that the file he's trying to read has a length which
>> is not a multiple of 512.
> Yes, suppose I am trying to read a file which size is 4000 bytes, and
> the allocated buffer is 4096 bytes.
> When running my test program, it returns 4000 for the first round and
> -22 for the second.
> While running my test program on 3.10, it returns 4000 and then 0,
> which is my expected behavior.
> 
> Sorry for the brought misunderstanding.
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Handle EOF in case of not aligned direct io read in kernel
  2018-01-22  1:57       ` Joseph Qi
@ 2018-01-25  8:47         ` Jan Kara
  2018-01-25  9:16           ` Joseph Qi
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2018-01-25  8:47 UTC (permalink / raw)
  To: Joseph Qi
  Cc: Matthew Wilcox, Steven Whitehouse, linux-fsdevel, viro, gnehzuil,
	caijingxian

Hi!

On Mon 22-01-18 09:57:05, Joseph Qi wrote:
> Any comments on this case?
> Or should we definitely leave it to be handled in user space code?

Since direct IO is not defined by any standard we are free to define our
own semantics. Since the behavior has changed about 4 years ago, we'd break
userspace with about the same likelhood when going back to the old behavior
as when staying with the current one. And the new semantics makes sense as
well - unaligned direct IO reads reads are invalid even though in this
particular case we could also handle the read and return 0. So I'd prefer
to stay with the current behavior.

								Honza

> On 18/1/18 09:12, Joseph Qi wrote:
> > 
> > 
> > On 18/1/17 22:25, Matthew Wilcox wrote:
> >> On Wed, Jan 17, 2018 at 10:09:28AM +0000, Steven Whitehouse wrote:
> >>> Hi,
> >>>
> >>>
> >>> On 17/01/18 09:28, Joseph Qi wrote:
> >>>> Hi All,
> >>>>
> >>>> commit 9fe55eea7e4b ("Fix race when checking i_size on direct i/o read")
> >>>> has removed the pos check due to the race case.
> >>>>
> >>>> Now if I want to do direct read on a file which size is not sector
> >>>> alignment, it will return EINVAL in the last round. That means I have to
> >>>> handle the case by checking file size in user space, which I think is a
> >>>> bit inconvenient.
> >>> What do you mean by "not sector alignment"? Are you intending to read files
> >>> with any arbitrary size, or those with 512 byte alignment on a filesystem
> >>> with some larger block size, or something else?
> >> I think he's saying that the file he's trying to read has a length which
> >> is not a multiple of 512.
> > Yes, suppose I am trying to read a file which size is 4000 bytes, and
> > the allocated buffer is 4096 bytes.
> > When running my test program, it returns 4000 for the first round and
> > -22 for the second.
> > While running my test program on 3.10, it returns 4000 and then 0,
> > which is my expected behavior.
> > 
> > Sorry for the brought misunderstanding.
> > 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Handle EOF in case of not aligned direct io read in kernel
  2018-01-25  8:47         ` Jan Kara
@ 2018-01-25  9:16           ` Joseph Qi
  0 siblings, 0 replies; 7+ messages in thread
From: Joseph Qi @ 2018-01-25  9:16 UTC (permalink / raw)
  To: Jan Kara
  Cc: Matthew Wilcox, Steven Whitehouse, linux-fsdevel, viro, gnehzuil,
	caijingxian

Hi Jan,

On 18/1/25 16:47, Jan Kara wrote:
> Hi!
> 
> On Mon 22-01-18 09:57:05, Joseph Qi wrote:
>> Any comments on this case?
>> Or should we definitely leave it to be handled in user space code?
> 
> Since direct IO is not defined by any standard we are free to define our
> own semantics. Since the behavior has changed about 4 years ago, we'd break
> userspace with about the same likelhood when going back to the old behavior
> as when staying with the current one. And the new semantics makes sense as
> well - unaligned direct IO reads reads are invalid even though in this
> particular case we could also handle the read and return 0. So I'd prefer
> to stay with the current behavior.
> 
> 								Honza

Thanks for your advice.

Joseph

> 
>> On 18/1/18 09:12, Joseph Qi wrote:
>>>
>>>
>>> On 18/1/17 22:25, Matthew Wilcox wrote:
>>>> On Wed, Jan 17, 2018 at 10:09:28AM +0000, Steven Whitehouse wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> On 17/01/18 09:28, Joseph Qi wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> commit 9fe55eea7e4b ("Fix race when checking i_size on direct i/o read")
>>>>>> has removed the pos check due to the race case.
>>>>>>
>>>>>> Now if I want to do direct read on a file which size is not sector
>>>>>> alignment, it will return EINVAL in the last round. That means I have to
>>>>>> handle the case by checking file size in user space, which I think is a
>>>>>> bit inconvenient.
>>>>> What do you mean by "not sector alignment"? Are you intending to read files
>>>>> with any arbitrary size, or those with 512 byte alignment on a filesystem
>>>>> with some larger block size, or something else?
>>>> I think he's saying that the file he's trying to read has a length which
>>>> is not a multiple of 512.
>>> Yes, suppose I am trying to read a file which size is 4000 bytes, and
>>> the allocated buffer is 4096 bytes.
>>> When running my test program, it returns 4000 for the first round and
>>> -22 for the second.
>>> While running my test program on 3.10, it returns 4000 and then 0,
>>> which is my expected behavior.
>>>
>>> Sorry for the brought misunderstanding.
>>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-01-25  9:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-17  9:28 [RFC] Handle EOF in case of not aligned direct io read in kernel Joseph Qi
2018-01-17 10:09 ` Steven Whitehouse
2018-01-17 14:25   ` Matthew Wilcox
2018-01-18  1:12     ` Joseph Qi
2018-01-22  1:57       ` Joseph Qi
2018-01-25  8:47         ` Jan Kara
2018-01-25  9:16           ` Joseph Qi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).