Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Ritesh Harjani <riteshh@linux.ibm.com>
To: Dave Chinner <david@fromorbit.com>,
	Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: hch@infradead.org, darrick.wong@oracle.com,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, willy@infradead.org
Subject: Re: [PATCH] iomap: Fix the write_count in iomap_add_to_ioend().
Date: Fri, 21 Aug 2020 10:15:33 +0530	[thread overview]
Message-ID: <20200821044533.BBFD1A405F@d06av23.portsmouth.uk.ibm.com> (raw)
In-Reply-To: <20200820231140.GE7941@dread.disaster.area>

Hello Dave,

Thanks for reviewing this.

On 8/21/20 4:41 AM, Dave Chinner wrote:
> On Wed, Aug 19, 2020 at 03:58:41PM +0530, Anju T Sudhakar wrote:
>> From: Ritesh Harjani <riteshh@linux.ibm.com>
>>
>> __bio_try_merge_page() may return same_page = 1 and merged = 0.
>> This could happen when bio->bi_iter.bi_size + len > UINT_MAX.
> 
> Ummm, silly question, but exactly how are we getting a bio that
> large in ->writepages getting built? Even with 64kB pages, that's a
> bio with 2^16 pages attached to it. We shouldn't be building single
> bios in writeback that large - what storage hardware is allowing
> such huge bios to be built? (i.e. can you dump all the values in
> /sys/block/<dev>/queue/* for that device for us?)

Please correct me here, but as I see, bio has only these two limits
which it checks for adding page to bio. It doesn't check for limits
of /sys/block/<dev>/queue/* no? I guess then it could be checked
by block layer below b4 submitting the bio?

113 static inline bool bio_full(struct bio *bio, unsigned len)
114 {
115         if (bio->bi_vcnt >= bio->bi_max_vecs)
116                 return true;
117
118         if (bio->bi_iter.bi_size > UINT_MAX - len)
119                 return true;
120
121         return false;
122 }


This issue was first observed while running a fio run on a system with
huge memory. But then here is an easy way we figured out to trigger the
issue almost everytime with loop device on my VM setup. I have provided
all the details on this below.

<cmds to trigger it fairly quickly>
===================================
echo 99999999 > /proc/sys/vm/dirtytime_expire_seconds
echo 99999999 > /proc/sys/vm/dirty_expire_centisecs
echo 90  > /proc/sys/vm/dirty_rati0
echo 90  > /proc/sys/vm/dirty_background_ratio
echo 0  > /proc/sys/vm/dirty_writeback_centisecs

sudo perf probe -s ~/host_shared/src/linux/ -a '__bio_try_merge_page:10 
bio page page->index bio->bi_iter.bi_size len same_page[0]'

sudo perf record -e probe:__bio_try_merge_page_L10 -a --filter 'bi_size 
 > 0xff000000' sudo fio --rw=write --bs=1M --numjobs=1 
--name=/mnt/testfile --size=24G --ioengine=libaio


# on running this 2nd time it gets hit everytime on my setup

sudo perf record -e probe:__bio_try_merge_page_L10 -a --filter 'bi_size 
 > 0xff000000' sudo fio --rw=write --bs=1M --numjobs=1 
--name=/mnt/testfile --size=24G --ioengine=libaio


Perf o/p from above filter causing overflow
===========================================
<...>
              fio 25194 [029] 70471.559084: 
probe:__bio_try_merge_page_L10: (c000000000aa054c) 
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d 
bi_size=0xffff8000 len=0x1000 same_page=0x1
              fio 25194 [029] 70471.559087: 
probe:__bio_try_merge_page_L10: (c000000000aa054c) 
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d 
bi_size=0xffff9000 len=0x1000 same_page=0x1
              fio 25194 [029] 70471.559090: 
probe:__bio_try_merge_page_L10: (c000000000aa054c) 
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d 
bi_size=0xffffa000 len=0x1000 same_page=0x1
              fio 25194 [029] 70471.559093: 
probe:__bio_try_merge_page_L10: (c000000000aa054c) 
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d 
bi_size=0xffffb000 len=0x1000 same_page=0x1
              fio 25194 [029] 70471.559095: 
probe:__bio_try_merge_page_L10: (c000000000aa054c) 
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d 
bi_size=0xffffc000 len=0x1000 same_page=0x1
              fio 25194 [029] 70471.559098: 
probe:__bio_try_merge_page_L10: (c000000000aa054c) 
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d 
bi_size=0xffffd000 len=0x1000 same_page=0x1
              fio 25194 [029] 70471.559101: 
probe:__bio_try_merge_page_L10: (c000000000aa054c) 
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d 
bi_size=0xffffe000 len=0x1000 same_page=0x1
              fio 25194 [029] 70471.559104: 
probe:__bio_try_merge_page_L10: (c000000000aa054c) 
bio=0xc0000013d49a4b80 page=0xc00c000004029d80 index=0x10a9d 
bi_size=0xfffff000 len=0x1000 same_page=0x1

^^^^^^ (this could cause an overflow)

loop dev
=========
NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE    DIO LOG-SEC
/dev/loop1         0      0         0  0 /mnt1/filefs   0     512


mount o/p
=========
/dev/loop1 on /mnt type xfs 
(rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)


/sys/block/<dev>/queue/*
========================

setup:/run/perf$ cat /sys/block/loop1/queue/max_segments
128
setup:/run/perf$ cat /sys/block/loop1/queue/max_segment_size
65536
setup:/run/perf$ cat /sys/block/loop1/queue/max_hw_sectors_kb
1280
setup:/run/perf$ cat /sys/block/loop1/queue/logical_block_size
512
setup:/run/perf$ cat /sys/block/loop1/queue/max_sectors_kb
1280
setup:/run/perf$ cat /sys/block/loop1/queue/hw_sector_size
512
setup:/run/perf$ cat /sys/block/loop1/queue/discard_max_bytes
4294966784
setup:/run/perf$ cat /sys/block/loop1/queue/discard_max_hw_bytes
4294966784
setup:/run/perf$ cat /sys/block/loop1/queue/discard_zeroes_data
0
setup:/run/perf$ cat /sys/block/loop1/queue/discard_granularity
4096
setup:/run/perf$ cat /sys/block/loop1/queue/chunk_sectors
0
setup:/run/perf$ cat /sys/block/loop1/queue/max_discard_segments
1
setup:/run/perf$ cat /sys/block/loop1/queue/read_ahead_kb
128
setup:/run/perf$ cat /sys/block/loop1/queue/rotational
1
setup:/run/perf$ cat /sys/block/loop1/queue/physical_block_size
512
setup:/run/perf$ cat /sys/block/loop1/queue/write_same_max_bytes
0
setup:/run/perf$ cat /sys/block/loop1/queue/write_zeroes_max_bytes
4294966784

  reply	other threads:[~2020-08-21  4:45 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-19 10:28 Anju T Sudhakar
2020-08-20 23:11 ` Dave Chinner
2020-08-21  4:45   ` Ritesh Harjani [this message]
2020-08-21  6:00     ` Christoph Hellwig
2020-08-21  9:09       ` Ritesh Harjani
2020-08-21 21:53     ` Dave Chinner
2020-08-22 13:13       ` Christoph Hellwig
2020-08-24 14:28         ` Brian Foster
2020-08-24 15:04           ` Christoph Hellwig
2020-08-24 15:48             ` Brian Foster
2020-08-25  0:42               ` Dave Chinner
2020-08-25 14:49                 ` Brian Foster
2020-08-31  4:01                   ` Ming Lei
2020-08-31 14:35                     ` Brian Foster
2020-09-16  0:12                   ` Darrick J. Wong
2020-09-16  8:45                     ` Christoph Hellwig
2020-09-16 13:07                       ` Brian Foster
2020-09-17  8:04                         ` Christoph Hellwig
2020-09-17 10:42                           ` Brian Foster
2020-09-17 14:48                             ` Christoph Hellwig
2020-09-17 21:33                               ` Darrick J. Wong
2020-09-17 23:13                           ` Ming Lei
2020-08-21  6:01   ` Christoph Hellwig
2020-08-21  6:07 ` Christoph Hellwig
2020-08-21  8:53   ` Ritesh Harjani
2020-08-21 14:49   ` Jens Axboe
2020-08-21 13:31 ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200821044533.BBFD1A405F@d06av23.portsmouth.uk.ibm.com \
    --to=riteshh@linux.ibm.com \
    --cc=anju@linux.vnet.ibm.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    --subject='Re: [PATCH] iomap: Fix the write_count in iomap_add_to_ioend().' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).