Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] fat: Avoid oops when bdi->io_pages==0
@ 2020-08-30  0:59 OGAWA Hirofumi
  2020-08-30  1:21 ` Matthew Wilcox
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: OGAWA Hirofumi @ 2020-08-30  0:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel

On one system, there was bdi->io_pages==0. This seems to be the bug of
a driver somewhere, and should fix it though. Anyway, it is better to
avoid the divide-by-zero Oops.

So this check it.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: <stable@vger.kernel.org>
---
 fs/fat/fatent.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
index f7e3304..98a1c4f 100644
--- a/fs/fat/fatent.c	2020-08-30 06:52:47.251564566 +0900
+++ b/fs/fat/fatent.c	2020-08-30 06:54:05.838319213 +0900
@@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo
 	if (fatent->entry >= ent_limit)
 		return;
 
-	if (ra_pages > sb->s_bdi->io_pages)
+	if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages)
 		ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
 	reada_blocks = ra_pages << (PAGE_SHIFT - sb->s_blocksize_bits + 1);
 
_

-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0
  2020-08-30  0:59 [PATCH] fat: Avoid oops when bdi->io_pages==0 OGAWA Hirofumi
@ 2020-08-30  1:21 ` Matthew Wilcox
  2020-08-30  1:54   ` OGAWA Hirofumi
  2020-08-30 14:01 ` Sasha Levin
  2020-08-31 15:22 ` Jens Axboe
  2 siblings, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2020-08-30  1:21 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Andrew Morton, linux-kernel, linux-fsdevel

On Sun, Aug 30, 2020 at 09:59:41AM +0900, OGAWA Hirofumi wrote:
> On one system, there was bdi->io_pages==0. This seems to be the bug of
> a driver somewhere, and should fix it though. Anyway, it is better to
> avoid the divide-by-zero Oops.
> 
> So this check it.
> 
> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> Cc: <stable@vger.kernel.org>
> ---
>  fs/fat/fatent.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
> index f7e3304..98a1c4f 100644
> --- a/fs/fat/fatent.c	2020-08-30 06:52:47.251564566 +0900
> +++ b/fs/fat/fatent.c	2020-08-30 06:54:05.838319213 +0900
> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo
>  	if (fatent->entry >= ent_limit)
>  		return;
>  
> -	if (ra_pages > sb->s_bdi->io_pages)
> +	if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages)
>  		ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);

Wait, rounddown?  ->io_pages is supposed to be the maximum number of
pages to readahead.  Shouldn't this be max() instead of rounddown()?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0
  2020-08-30  1:21 ` Matthew Wilcox
@ 2020-08-30  1:54   ` OGAWA Hirofumi
  2020-08-30  3:53     ` Matthew Wilcox
  0 siblings, 1 reply; 15+ messages in thread
From: OGAWA Hirofumi @ 2020-08-30  1:54 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andrew Morton, linux-kernel, linux-fsdevel

Matthew Wilcox <willy@infradead.org> writes:

> On Sun, Aug 30, 2020 at 09:59:41AM +0900, OGAWA Hirofumi wrote:
>> On one system, there was bdi->io_pages==0. This seems to be the bug of
>> a driver somewhere, and should fix it though. Anyway, it is better to
>> avoid the divide-by-zero Oops.
>> 
>> So this check it.
>> 
>> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
>> Cc: <stable@vger.kernel.org>
>> ---
>>  fs/fat/fatent.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
>> index f7e3304..98a1c4f 100644
>> --- a/fs/fat/fatent.c	2020-08-30 06:52:47.251564566 +0900
>> +++ b/fs/fat/fatent.c	2020-08-30 06:54:05.838319213 +0900
>> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo
>>  	if (fatent->entry >= ent_limit)
>>  		return;
>>  
>> -	if (ra_pages > sb->s_bdi->io_pages)
>> +	if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages)
>>  		ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
>
> Wait, rounddown?  ->io_pages is supposed to be the maximum number of
> pages to readahead.  Shouldn't this be max() instead of rounddown()?

Hm, io_pages is limited by driver setting too, and io_pages can be lower
than ra_pages, e.g. usb storage.

Assuming ra_pages is user intent of readahead window. So if io_pages is
lower than ra_pages, this try ra_pages to align of io_pages chunk, but
not bigger than ra_pages. Because if block layer splits I/O requests to
hard limit, then I/O is not optimal.

So it is intent, I can be misunderstanding though.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0
  2020-08-30  1:54   ` OGAWA Hirofumi
@ 2020-08-30  3:53     ` Matthew Wilcox
  2020-08-30  9:04       ` OGAWA Hirofumi
  0 siblings, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2020-08-30  3:53 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Andrew Morton, linux-kernel, linux-fsdevel

On Sun, Aug 30, 2020 at 10:54:35AM +0900, OGAWA Hirofumi wrote:
> Matthew Wilcox <willy@infradead.org> writes:
> 
> > On Sun, Aug 30, 2020 at 09:59:41AM +0900, OGAWA Hirofumi wrote:
> >> On one system, there was bdi->io_pages==0. This seems to be the bug of
> >> a driver somewhere, and should fix it though. Anyway, it is better to
> >> avoid the divide-by-zero Oops.
> >> 
> >> So this check it.
> >> 
> >> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> >> Cc: <stable@vger.kernel.org>
> >> ---
> >>  fs/fat/fatent.c |    2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
> >> index f7e3304..98a1c4f 100644
> >> --- a/fs/fat/fatent.c	2020-08-30 06:52:47.251564566 +0900
> >> +++ b/fs/fat/fatent.c	2020-08-30 06:54:05.838319213 +0900
> >> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo
> >>  	if (fatent->entry >= ent_limit)
> >>  		return;
> >>  
> >> -	if (ra_pages > sb->s_bdi->io_pages)
> >> +	if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages)
> >>  		ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
> >
> > Wait, rounddown?  ->io_pages is supposed to be the maximum number of
> > pages to readahead.  Shouldn't this be max() instead of rounddown()?

Sorry, I meant 'min', not 'max'.

> Hm, io_pages is limited by driver setting too, and io_pages can be lower
> than ra_pages, e.g. usb storage.
> 
> Assuming ra_pages is user intent of readahead window. So if io_pages is
> lower than ra_pages, this try ra_pages to align of io_pages chunk, but
> not bigger than ra_pages. Because if block layer splits I/O requests to
> hard limit, then I/O is not optimal.
> 
> So it is intent, I can be misunderstanding though.

Looking at this some more, I'm not sure it makes sense to consult ->io_pages
at all.  I see how it gets set to 0 -- the admin can write '1' to
/sys/block/<device>/queue/max_sectors_kb and that gets turned into 0
in ->io_pages.

But I'm not sure it makes any sense to respect that.  Looking at mm/readahead.c, all it does is limit the size of a read request which exceeds the current readahead window.  It's not used to limit the readahead window itself.  For
example:

        unsigned long max_pages = ra->ra_pages;
...
        if (req_size > max_pages && bdi->io_pages > max_pages)
                max_pages = min(req_size, bdi->io_pages);

Setting io_pages below ra_pages has no effect.  So maybe fat should also
disregard it?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0
  2020-08-30  3:53     ` Matthew Wilcox
@ 2020-08-30  9:04       ` OGAWA Hirofumi
  0 siblings, 0 replies; 15+ messages in thread
From: OGAWA Hirofumi @ 2020-08-30  9:04 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andrew Morton, linux-kernel, linux-fsdevel

Matthew Wilcox <willy@infradead.org> writes:

> On Sun, Aug 30, 2020 at 10:54:35AM +0900, OGAWA Hirofumi wrote:
>> Matthew Wilcox <willy@infradead.org> writes:
>> 
>> Hm, io_pages is limited by driver setting too, and io_pages can be lower
>> than ra_pages, e.g. usb storage.
>> 
>> Assuming ra_pages is user intent of readahead window. So if io_pages is
>> lower than ra_pages, this try ra_pages to align of io_pages chunk, but
>> not bigger than ra_pages. Because if block layer splits I/O requests to
>> hard limit, then I/O is not optimal.
>> 
>> So it is intent, I can be misunderstanding though.
>
> Looking at this some more, I'm not sure it makes sense to consult ->io_pages
> at all.  I see how it gets set to 0 -- the admin can write '1' to
> /sys/block/<device>/queue/max_sectors_kb and that gets turned into 0
> in ->io_pages.

	if (max_sectors_kb > max_hw_sectors_kb || max_sectors_kb < page_kb)
		return -EINVAL;

It should not set to 0 via /sys/.../max_sectors_kb. However the default
of bdi->io_pages is 0. So it happened if a driver didn't initialized it.

> But I'm not sure it makes any sense to respect that.  Looking at
> mm/readahead.c, all it does is limit the size of a read request which
> exceeds the current readahead window.  It's not used to limit the
> readahead window itself.  For example:
>
>         unsigned long max_pages = ra->ra_pages;
> ...
>         if (req_size > max_pages && bdi->io_pages > max_pages)
>                 max_pages = min(req_size, bdi->io_pages);
>
> Setting io_pages below ra_pages has no effect.  So maybe fat should also
> disregard it?

          |-----------------------| requested blocks
[before]
 ra_pages |===========|===========|===========|
 io_pages |---------|---------|---------|
 req      |---------|-|-------|---|

[after]
 ra_pages |=========|=========|=========|
 io_pages |---------|---------|---------|
 req      |---------|---------|---|

This path is known the large sequential read there. Well, anyway, this
intent is to use [after] as 3 req, instead of [before] as 4 req.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0
  2020-08-30  0:59 [PATCH] fat: Avoid oops when bdi->io_pages==0 OGAWA Hirofumi
  2020-08-30  1:21 ` Matthew Wilcox
@ 2020-08-30 14:01 ` Sasha Levin
  2020-08-30 14:16   ` OGAWA Hirofumi
  2020-08-31 15:22 ` Jens Axboe
  2 siblings, 1 reply; 15+ messages in thread
From: Sasha Levin @ 2020-08-30 14:01 UTC (permalink / raw)
  To: Sasha Levin, OGAWA Hirofumi, Andrew Morton
  Cc: linux-kernel, linux-fsdevel, stable, stable

Hi

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.8.5, v5.4.61, v4.19.142, v4.14.195, v4.9.234, v4.4.234.

v5.8.5: Build OK!
v5.4.61: Failed to apply! Possible dependencies:
    898310032b96 ("fat: improve the readahead for FAT entries")
    a090a5a7d73f ("fat: fix fat_ra_init() for data clusters == 0")

v4.19.142: Failed to apply! Possible dependencies:
    898310032b96 ("fat: improve the readahead for FAT entries")
    a090a5a7d73f ("fat: fix fat_ra_init() for data clusters == 0")

v4.14.195: Failed to apply! Possible dependencies:
    898310032b96 ("fat: improve the readahead for FAT entries")
    a090a5a7d73f ("fat: fix fat_ra_init() for data clusters == 0")
    f663b5b38fff ("fat: add FITRIM ioctl for FAT file system")

v4.9.234: Failed to apply! Possible dependencies:
    898310032b96 ("fat: improve the readahead for FAT entries")
    a090a5a7d73f ("fat: fix fat_ra_init() for data clusters == 0")
    f663b5b38fff ("fat: add FITRIM ioctl for FAT file system")

v4.4.234: Failed to apply! Possible dependencies:
    898310032b96 ("fat: improve the readahead for FAT entries")
    8992de4cec12 ("fat: constify fatent_operations structures")
    a090a5a7d73f ("fat: fix fat_ra_init() for data clusters == 0")
    f663b5b38fff ("fat: add FITRIM ioctl for FAT file system")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0
  2020-08-30 14:01 ` Sasha Levin
@ 2020-08-30 14:16   ` OGAWA Hirofumi
  0 siblings, 0 replies; 15+ messages in thread
From: OGAWA Hirofumi @ 2020-08-30 14:16 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Andrew Morton, linux-kernel, linux-fsdevel, stable

Sasha Levin <sashal@kernel.org> writes:

> Hi
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.8.5, v5.4.61, v4.19.142, v4.14.195, v4.9.234, v4.4.234.
>
> v5.8.5: Build OK!

[...]

>
> How should we proceed with this patch?

Only 5.8.x has to apply this patch.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0
  2020-08-30  0:59 [PATCH] fat: Avoid oops when bdi->io_pages==0 OGAWA Hirofumi
  2020-08-30  1:21 ` Matthew Wilcox
  2020-08-30 14:01 ` Sasha Levin
@ 2020-08-31 15:22 ` Jens Axboe
  2020-08-31 16:37   ` OGAWA Hirofumi
  2 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2020-08-31 15:22 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Andrew Morton, linux-kernel, fsdevel

On Sat, Aug 29, 2020 at 7:08 PM OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
>
> On one system, there was bdi->io_pages==0. This seems to be the bug of
> a driver somewhere, and should fix it though. Anyway, it is better to
> avoid the divide-by-zero Oops.
>
> So this check it.
>
> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> Cc: <stable@vger.kernel.org>
> ---
>  fs/fat/fatent.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
> index f7e3304..98a1c4f 100644
> --- a/fs/fat/fatent.c   2020-08-30 06:52:47.251564566 +0900
> +++ b/fs/fat/fatent.c   2020-08-30 06:54:05.838319213 +0900
> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo
>         if (fatent->entry >= ent_limit)
>                 return;
>
> -       if (ra_pages > sb->s_bdi->io_pages)
> +       if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages)
>                 ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
>         reada_blocks = ra_pages << (PAGE_SHIFT - sb->s_blocksize_bits + 1);

I don't think we should work-around this here. What device is this on?
Something like the below may help.

diff --git a/block/blk-core.c b/block/blk-core.c
index d9d632639bd1..10c08ac50697 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -539,6 +539,7 @@ struct request_queue *blk_alloc_queue(int node_id)
 		goto fail_stats;
 
 	q->backing_dev_info->ra_pages = VM_READAHEAD_PAGES;
+	q->backing_dev_info->io_pages = VM_READAHEAD_PAGES;
 	q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
 	q->node = node_id;

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0
  2020-08-31 15:22 ` Jens Axboe
@ 2020-08-31 16:37   ` OGAWA Hirofumi
  2020-08-31 16:39     ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: OGAWA Hirofumi @ 2020-08-31 16:37 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel, fsdevel

Jens Axboe <axboe@kernel.dk> writes:

> On Sat, Aug 29, 2020 at 7:08 PM OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
>>
>> On one system, there was bdi->io_pages==0. This seems to be the bug of
>> a driver somewhere, and should fix it though. Anyway, it is better to
>> avoid the divide-by-zero Oops.
>>
>> So this check it.
>>
>> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
>> Cc: <stable@vger.kernel.org>
>> ---
>>  fs/fat/fatent.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
>> index f7e3304..98a1c4f 100644
>> --- a/fs/fat/fatent.c   2020-08-30 06:52:47.251564566 +0900
>> +++ b/fs/fat/fatent.c   2020-08-30 06:54:05.838319213 +0900
>> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo
>>         if (fatent->entry >= ent_limit)
>>                 return;
>>
>> -       if (ra_pages > sb->s_bdi->io_pages)
>> +       if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages)
>>                 ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
>>         reada_blocks = ra_pages << (PAGE_SHIFT - sb->s_blocksize_bits + 1);
>
> I don't think we should work-around this here. What device is this on?
> Something like the below may help.

The reported bug is from nvme stack, and the below patch (I submitted
same patch to you) fixed the reported case though. But I didn't verify
all possible path, so I'd liked to use safer side.

If block layer can guarantee io_pages!=0 instead, and can apply to
stable branch (5.8+). It would work too.

Thanks.

> diff --git a/block/blk-core.c b/block/blk-core.c
> index d9d632639bd1..10c08ac50697 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -539,6 +539,7 @@ struct request_queue *blk_alloc_queue(int node_id)
>  		goto fail_stats;
>  
>  	q->backing_dev_info->ra_pages = VM_READAHEAD_PAGES;
> +	q->backing_dev_info->io_pages = VM_READAHEAD_PAGES;
>  	q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
>  	q->node = node_id;


-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0
  2020-08-31 16:37   ` OGAWA Hirofumi
@ 2020-08-31 16:39     ` Jens Axboe
  2020-08-31 16:56       ` Matthew Wilcox
  2020-08-31 17:16       ` OGAWA Hirofumi
  0 siblings, 2 replies; 15+ messages in thread
From: Jens Axboe @ 2020-08-31 16:39 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Andrew Morton, linux-kernel, fsdevel

On 8/31/20 10:37 AM, OGAWA Hirofumi wrote:
> Jens Axboe <axboe@kernel.dk> writes:
> 
>> On Sat, Aug 29, 2020 at 7:08 PM OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
>>>
>>> On one system, there was bdi->io_pages==0. This seems to be the bug of
>>> a driver somewhere, and should fix it though. Anyway, it is better to
>>> avoid the divide-by-zero Oops.
>>>
>>> So this check it.
>>>
>>> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
>>> Cc: <stable@vger.kernel.org>
>>> ---
>>>  fs/fat/fatent.c |    2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
>>> index f7e3304..98a1c4f 100644
>>> --- a/fs/fat/fatent.c   2020-08-30 06:52:47.251564566 +0900
>>> +++ b/fs/fat/fatent.c   2020-08-30 06:54:05.838319213 +0900
>>> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo
>>>         if (fatent->entry >= ent_limit)
>>>                 return;
>>>
>>> -       if (ra_pages > sb->s_bdi->io_pages)
>>> +       if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages)
>>>                 ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
>>>         reada_blocks = ra_pages << (PAGE_SHIFT - sb->s_blocksize_bits + 1);
>>
>> I don't think we should work-around this here. What device is this on?
>> Something like the below may help.
> 
> The reported bug is from nvme stack, and the below patch (I submitted
> same patch to you) fixed the reported case though. But I didn't verify
> all possible path, so I'd liked to use safer side.
> 
> If block layer can guarantee io_pages!=0 instead, and can apply to
> stable branch (5.8+). It would work too.

We really should ensure that ->io_pages is always set, imho, instead of
having to work-around it in other spots.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0
  2020-08-31 16:39     ` Jens Axboe
@ 2020-08-31 16:56       ` Matthew Wilcox
  2020-08-31 17:00         ` Jens Axboe
  2020-08-31 17:16       ` OGAWA Hirofumi
  1 sibling, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2020-08-31 16:56 UTC (permalink / raw)
  To: Jens Axboe; +Cc: OGAWA Hirofumi, Andrew Morton, linux-kernel, fsdevel

On Mon, Aug 31, 2020 at 10:39:26AM -0600, Jens Axboe wrote:
> We really should ensure that ->io_pages is always set, imho, instead of
> having to work-around it in other spots.

Interestingly, there are only three places in the entire kernel which
_use_ bdi->io_pages.  FAT, Verity and the pagecache readahead code.

Verity:
                        unsigned long num_ra_pages =
                                min_t(unsigned long, num_blocks_to_hash - i,
                                      inode->i_sb->s_bdi->io_pages);

FAT:
        if (ra_pages > sb->s_bdi->io_pages)
                ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);

Pagecache:
        max_pages = max_t(unsigned long, bdi->io_pages, ra->ra_pages);
and
        if (req_size > max_pages && bdi->io_pages > max_pages)
                max_pages = min(req_size, bdi->io_pages);

The funny thing is that all three are using it differently.  Verity is
taking io_pages to be the maximum amount to readahead.  FAT is using
it as the unit of readahead (round down to the previous multiple) and
the pagecache uses it to limit reads that exceed the current per-file
readahead limit (but allows per-file readahead to exceed io_pages,
in which case it has no effect).

So how should it be used?  My inclination is to say that the pagecache
is right, by virtue of being the most-used.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0
  2020-08-31 16:56       ` Matthew Wilcox
@ 2020-08-31 17:00         ` Jens Axboe
  2020-08-31 17:39           ` OGAWA Hirofumi
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2020-08-31 17:00 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: OGAWA Hirofumi, Andrew Morton, linux-kernel, fsdevel

On 8/31/20 10:56 AM, Matthew Wilcox wrote:
> On Mon, Aug 31, 2020 at 10:39:26AM -0600, Jens Axboe wrote:
>> We really should ensure that ->io_pages is always set, imho, instead of
>> having to work-around it in other spots.
> 
> Interestingly, there are only three places in the entire kernel which
> _use_ bdi->io_pages.  FAT, Verity and the pagecache readahead code.
> 
> Verity:
>                         unsigned long num_ra_pages =
>                                 min_t(unsigned long, num_blocks_to_hash - i,
>                                       inode->i_sb->s_bdi->io_pages);
> 
> FAT:
>         if (ra_pages > sb->s_bdi->io_pages)
>                 ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
> 
> Pagecache:
>         max_pages = max_t(unsigned long, bdi->io_pages, ra->ra_pages);
> and
>         if (req_size > max_pages && bdi->io_pages > max_pages)
>                 max_pages = min(req_size, bdi->io_pages);
> 
> The funny thing is that all three are using it differently.  Verity is
> taking io_pages to be the maximum amount to readahead.  FAT is using
> it as the unit of readahead (round down to the previous multiple) and
> the pagecache uses it to limit reads that exceed the current per-file
> readahead limit (but allows per-file readahead to exceed io_pages,
> in which case it has no effect).
> 
> So how should it be used?  My inclination is to say that the pagecache
> is right, by virtue of being the most-used.

When I added ->io_pages, it was for the page cache use case. The others
grew after that...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0
  2020-08-31 16:39     ` Jens Axboe
  2020-08-31 16:56       ` Matthew Wilcox
@ 2020-08-31 17:16       ` OGAWA Hirofumi
  2020-08-31 17:19         ` Jens Axboe
  1 sibling, 1 reply; 15+ messages in thread
From: OGAWA Hirofumi @ 2020-08-31 17:16 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel, fsdevel

Jens Axboe <axboe@kernel.dk> writes:

> On 8/31/20 10:37 AM, OGAWA Hirofumi wrote:
>> Jens Axboe <axboe@kernel.dk> writes:
>> 
>>> I don't think we should work-around this here. What device is this on?
>>> Something like the below may help.
>> 
>> The reported bug is from nvme stack, and the below patch (I submitted
>> same patch to you) fixed the reported case though. But I didn't verify
>> all possible path, so I'd liked to use safer side.
>> 
>> If block layer can guarantee io_pages!=0 instead, and can apply to
>> stable branch (5.8+). It would work too.
>
> We really should ensure that ->io_pages is always set, imho, instead of
> having to work-around it in other spots.

I think it is good too. However, the issue would be how to do it for
stable branch.

If you think that block layer patch is enough and submit to stable
(5.8+) branch instead, I'm fine without fatfs patch. (Or removing
workaround in fatfs with block layer patch later?)

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0
  2020-08-31 17:16       ` OGAWA Hirofumi
@ 2020-08-31 17:19         ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-08-31 17:19 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Andrew Morton, linux-kernel, fsdevel

On 8/31/20 11:16 AM, OGAWA Hirofumi wrote:
> Jens Axboe <axboe@kernel.dk> writes:
> 
>> On 8/31/20 10:37 AM, OGAWA Hirofumi wrote:
>>> Jens Axboe <axboe@kernel.dk> writes:
>>>
>>>> I don't think we should work-around this here. What device is this on?
>>>> Something like the below may help.
>>>
>>> The reported bug is from nvme stack, and the below patch (I submitted
>>> same patch to you) fixed the reported case though. But I didn't verify
>>> all possible path, so I'd liked to use safer side.
>>>
>>> If block layer can guarantee io_pages!=0 instead, and can apply to
>>> stable branch (5.8+). It would work too.
>>
>> We really should ensure that ->io_pages is always set, imho, instead of
>> having to work-around it in other spots.
> 
> I think it is good too. However, the issue would be how to do it for
> stable branch.

Agree

> If you think that block layer patch is enough and submit to stable
> (5.8+) branch instead, I'm fine without fatfs patch. (Or removing
> workaround in fatfs with block layer patch later?)

Well, it should catch any block based case as we then set it when
allocating the queue. So should do the trick for all block based
cases at least.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] fat: Avoid oops when bdi->io_pages==0
  2020-08-31 17:00         ` Jens Axboe
@ 2020-08-31 17:39           ` OGAWA Hirofumi
  0 siblings, 0 replies; 15+ messages in thread
From: OGAWA Hirofumi @ 2020-08-31 17:39 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Matthew Wilcox, Andrew Morton, linux-kernel, fsdevel

Jens Axboe <axboe@kernel.dk> writes:

> On 8/31/20 10:56 AM, Matthew Wilcox wrote:
>> On Mon, Aug 31, 2020 at 10:39:26AM -0600, Jens Axboe wrote:
>>> We really should ensure that ->io_pages is always set, imho, instead of
>>> having to work-around it in other spots.
>> 
>> Interestingly, there are only three places in the entire kernel which
>> _use_ bdi->io_pages.  FAT, Verity and the pagecache readahead code.
>> 
>> Verity:
>>                         unsigned long num_ra_pages =
>>                                 min_t(unsigned long, num_blocks_to_hash - i,
>>                                       inode->i_sb->s_bdi->io_pages);
>> 
>> FAT:
>>         if (ra_pages > sb->s_bdi->io_pages)
>>                 ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
>> 
>> Pagecache:
>>         max_pages = max_t(unsigned long, bdi->io_pages, ra->ra_pages);
>> and
>>         if (req_size > max_pages && bdi->io_pages > max_pages)
>>                 max_pages = min(req_size, bdi->io_pages);
>> 
>> The funny thing is that all three are using it differently.  Verity is
>> taking io_pages to be the maximum amount to readahead.  FAT is using
>> it as the unit of readahead (round down to the previous multiple) and
>> the pagecache uses it to limit reads that exceed the current per-file
>> readahead limit (but allows per-file readahead to exceed io_pages,
>> in which case it has no effect).
>> 
>> So how should it be used?  My inclination is to say that the pagecache
>> is right, by virtue of being the most-used.
>
> When I added ->io_pages, it was for the page cache use case. The others
> grew after that...

FAT and pagecache usage would be similar or same purpose. The both is
using io_pages as optimal IO size.

In pagecache case, it uses io_pages if one request size is exceeding
io_pages. In FAT case, there is perfect knowledge about future/total
request size. So FAT divides request by io_pages, and adjust ra_pages
with knowledge.

I don't know about verity.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-08-31 17:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-30  0:59 [PATCH] fat: Avoid oops when bdi->io_pages==0 OGAWA Hirofumi
2020-08-30  1:21 ` Matthew Wilcox
2020-08-30  1:54   ` OGAWA Hirofumi
2020-08-30  3:53     ` Matthew Wilcox
2020-08-30  9:04       ` OGAWA Hirofumi
2020-08-30 14:01 ` Sasha Levin
2020-08-30 14:16   ` OGAWA Hirofumi
2020-08-31 15:22 ` Jens Axboe
2020-08-31 16:37   ` OGAWA Hirofumi
2020-08-31 16:39     ` Jens Axboe
2020-08-31 16:56       ` Matthew Wilcox
2020-08-31 17:00         ` Jens Axboe
2020-08-31 17:39           ` OGAWA Hirofumi
2020-08-31 17:16       ` OGAWA Hirofumi
2020-08-31 17:19         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).