LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: Ming Lei <ming.lei@canonical.com> To: linux-kernel@vger.kernel.org, Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Jens Axboe <axboe@kernel.dk>, Zach Brown <zab@zabbo.net>, Christoph Hellwig <hch@infradead.org>, Maxim Patlasov <mpatlasov@parallels.com>, Andrew Morton <akpm@linux-foundation.org>, Alexander Viro <viro@zeniv.linux.org.uk>, Benjamin LaHaise <bcrl@kvack.org>, Ming Lei <ming.lei@canonical.com> Subject: [PATCH v2 4/4] block: loop: support to submit I/O via kernel aio based Date: Tue, 13 Jan 2015 23:44:48 +0800 [thread overview] Message-ID: <1421163888-21452-5-git-send-email-ming.lei@canonical.com> (raw) In-Reply-To: <1421163888-21452-1-git-send-email-ming.lei@canonical.com> Part of the patch is based on Dave's previous post. This patch submits I/O to fs via kernel aio, and we can obtain following benefits: - double cache in both loop file system and backend file gets avoided - context switch decreased a lot, and finally CPU utilization is decreased - cached memory got decreased a lot One main side effect is that throughput is decreased when accessing raw loop block(not by filesystem) with kernel aio. This patch has passed xfstests test(./check -g auto), and both test and scratch devices are loop block, file system is ext4. Follows two fio tests' result: 1. fio test inside ext4 file system over loop block 1) How to run - linux kernel base: 3.19.0-rc3-next-20150108(loop-mq merged) - loop over SSD image 1 in ext4 - linux psync, 16 jobs, size 200M, ext4 over loop block - test result: IOPS from fio output 2) Throughput result: ------------------------------------------------------------- test cases |randread |read |randwrite |write | ------------------------------------------------------------- base |16799 |59508 |31059 |58829 ------------------------------------------------------------- base+kernel aio |15480 |64453 |30187 |57222 ------------------------------------------------------------- 3) CPU - context switch decreased to 1/3 ~ 1/2 with kernel aio, depends on load, see 'Contexts' of [1] and [2] - CPU utilization decreased to 1/2 ~ 2/3 with kernel aio, depends on load, see 'CPUs' of [1] and [2] - less processes created with kernel aio, see 'Processes' of [1] and [2] 4) memory(free, cached) - After these four tests with kernel aio: ~10% memory becomes used - After these four tests without kernel aio: ~60% memory becomes used - see 'Memory Usage' of [1] and [2] 2. fio test over loop block directly 1) How to run - linux kernel base: 3.19.0-rc3-next-20150108(loop-mq merged) - loop over SSD image 2 in ext4 - linux aio/O_DIRECT/bs: 4K/64 io depth/one job over loop block - test result: IOPS from fio output 2) Throughput result: ------------------------------------------------------------- test cases |randread |read |randwrite |write | ------------------------------------------------------------- base |24568 |55141 |34231 |43694 ------------------------------------------------------------- base+kernel aio |25130 |22813 |24441 |40880 ------------------------------------------------------------- 3) CPU: - CPU utilization decreased to 1/2 ~ 2/3 with kernel aio during randread, read test and randwrite test, but a bit increased in write tests, See 'Cpus' of [3] and [4] - Context switch has similar result with above too, see 'Contexts' of [3] and [4] - Less processes created in randread test, a bit more processes in write test with kernel aio 4) Memory: - After these four tests with kernel aio: ~15% memory becomes used - After these four tests without kernel aio: ~90% memory becomes used - see 'Memory Usage' of [3] and [4] 3. sar monitor result in graphical style [1], linux kernel base: sar monitor result in case of fio test 1 http://kernel.ubuntu.com/~ming/block/loop-mq-aio/v2/vm-loop-mq-fio-ext4.pdf [2], linux kernel base plus kernel aio patch: sar monitor result in case of fio test 1 http://kernel.ubuntu.com/~ming/block/loop-mq-aio/v2/vm-loop-mq-aio-fio-ext4.pdf [3], linux kernel base: sar monitor result in case of fio test 2 http://kernel.ubuntu.com/~ming/block/loop-mq-aio/v2/vm-loop-mq-fio-disk.pdf [4], linux kernel base plus kernel aio patch: sar monitor result in case of fio test 2 http://kernel.ubuntu.com/~ming/block/loop-mq-aio/v2/vm-loop-mq-aio-fio-disk.pdf Cc: Maxim Patlasov <mpatlasov@parallels.com> Cc: Zach Brown <zab@zabbo.net> Cc: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> --- drivers/block/loop.c | 140 ++++++++++++++++++++++++++++++++++++++++++++++++-- drivers/block/loop.h | 10 ++++ 2 files changed, 146 insertions(+), 4 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 47af456..bce06e7 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -450,10 +450,84 @@ static int lo_req_flush(struct loop_device *lo, struct request *rq) return ret; } +#ifdef CONFIG_AIO +static void lo_rw_aio_complete(u64 data, long res) +{ + struct loop_cmd *cmd = (struct loop_cmd *)(uintptr_t)data; + struct request *rq = cmd->rq; + + if (res > 0) + res = 0; + else if (res < 0) + res = -EIO; + + kfree(cmd->alloc_bv); + rq->errors = res; + blk_mq_complete_request(rq); +} + +static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, + bool write, loff_t pos) +{ + unsigned int i = 0; + struct iov_iter iter; + struct bio_vec *bvec, bv; + size_t nr_segs = 0; + struct req_iterator r_iter; + + rq_for_each_segment(bv, cmd->rq, r_iter) + nr_segs++; + + if (nr_segs > LOOP_CMD_BVEC_CNT) { + cmd->alloc_bv = kmalloc(nr_segs * sizeof(*cmd->alloc_bv), + GFP_NOIO); + if (!cmd->alloc_bv) + return -ENOMEM; + bvec = cmd->alloc_bv; + } else { + bvec = cmd->bv; + cmd->alloc_bv = NULL; + } + + rq_for_each_segment(bv, cmd->rq, r_iter) + bvec[i++] = bv; + + iter.type = ITER_BVEC | (write ? WRITE : 0); + iter.bvec = bvec; + iter.nr_segs = nr_segs; + iter.count = blk_rq_bytes(cmd->rq); + iter.iov_offset = 0; + + aio_kernel_init_rw(&cmd->iocb, lo->lo_backing_file, + iov_iter_count(&iter), pos, + lo_rw_aio_complete, (u64)(uintptr_t)cmd); + + return aio_kernel_submit(&cmd->iocb, write, &iter); +} +#else +static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, + bool write, loff_t pos) +{ + return -EIO; +} +#endif /* CONFIG_AIO */ + +static int lo_io_rw(struct loop_device *lo, struct loop_cmd *cmd, + bool write, loff_t pos) +{ + if (cmd->use_aio) + return lo_rw_aio(lo, cmd, write, pos); + if (write) + return lo_send(lo, cmd->rq, pos); + else + return lo_receive(lo, cmd->rq, lo->lo_blocksize, pos); +} + static int do_req_filebacked(struct loop_device *lo, struct request *rq) { loff_t pos; int ret; + struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq); pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset; @@ -463,9 +537,9 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq) else if (rq->cmd_flags & REQ_DISCARD) ret = lo_discard(lo, rq, pos); else - ret = lo_send(lo, rq, pos); + ret = lo_io_rw(lo, cmd, true, pos); } else - ret = lo_receive(lo, rq, lo->lo_blocksize, pos); + ret = lo_io_rw(lo, cmd, false, pos); return ret; } @@ -684,6 +758,15 @@ ssize_t loop_attr_do_store_use_aio(struct device *dev, lo->use_aio = true; else lo->use_aio = false; + + if (lo->use_aio != lo->can_use_aio) { + if (lo->use_aio) + return -EPERM; + + lo->lo_backing_file->f_flags &= O_DIRECT; + lo->can_use_aio = false; + } + return count; } @@ -803,6 +886,14 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, !file->f_op->write) lo_flags |= LO_FLAGS_READ_ONLY; +#ifdef CONFIG_AIO + if (file->f_op->write_iter && file->f_op->read_iter && + mapping->a_ops->direct_IO) { + file->f_flags |= O_DIRECT; + lo->can_use_aio = true; + } +#endif + lo_blocksize = S_ISBLK(inode->i_mode) ? inode->i_bdev->bd_block_size : PAGE_SIZE; @@ -836,6 +927,14 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, set_blocksize(bdev, lo_blocksize); + /* + * We must not send too-small direct-io requests, so we reflect + * the minimum io size to the loop device's logical block size + */ + if (lo->can_use_aio && inode->i_sb->s_bdev) + blk_queue_logical_block_size(lo->lo_queue, + bdev_io_min(inode->i_sb->s_bdev)); + lo->lo_state = Lo_bound; if (part_shift) lo->lo_flags |= LO_FLAGS_PARTSCAN; @@ -1506,14 +1605,33 @@ int loop_unregister_transfer(int number) EXPORT_SYMBOL(loop_register_transfer); EXPORT_SYMBOL(loop_unregister_transfer); +/* return true for single queue schedule */ +static bool loop_prep_sched_rq(struct loop_cmd *cmd) +{ + struct loop_device *lo = cmd->rq->q->queuedata; + bool single_queue = false; + + cmd->use_aio = false; + if (lo->can_use_aio && (lo->transfer == transfer_none)) { + if (!(cmd->rq->cmd_flags & (REQ_FLUSH | REQ_DISCARD))) + cmd->use_aio = true; + } + + if ((cmd->rq->cmd_flags & REQ_WRITE) || cmd->use_aio) + single_queue = true; + + return single_queue; +} + static int loop_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd) { struct loop_cmd *cmd = blk_mq_rq_to_pdu(bd->rq); + bool single_queue = loop_prep_sched_rq(cmd); blk_mq_start_request(bd->rq); - if (cmd->rq->cmd_flags & REQ_WRITE) { + if (single_queue) { struct loop_device *lo = cmd->rq->q->queuedata; bool need_sched = true; @@ -1551,7 +1669,8 @@ static void loop_handle_cmd(struct loop_cmd *cmd) failed: if (ret) cmd->rq->errors = -EIO; - blk_mq_complete_request(cmd->rq); + if (!cmd->use_aio || ret) + blk_mq_complete_request(cmd->rq); } static void loop_queue_write_work(struct work_struct *work) @@ -1653,6 +1772,19 @@ static int loop_add(struct loop_device **l, int i) INIT_LIST_HEAD(&lo->write_cmd_head); INIT_WORK(&lo->write_work, loop_queue_write_work); + blk_queue_max_segments(lo->lo_queue, LOOP_CMD_SEG_CNT); + blk_queue_max_hw_sectors(lo->lo_queue, -1U); + blk_queue_max_segment_size(lo->lo_queue, -1U); + + /* + * kernel aio can avoid double cache, decrease CPU load + * and won't affect throughput much if I/O originates from + * file system. But suggest to disable kernel aio via sysfs + * for obtaining better throughput in case that loop block + * device is accessed directly. + */ + lo->use_aio = true; + disk = lo->lo_disk = alloc_disk(1 << part_shift); if (!disk) goto out_free_queue; diff --git a/drivers/block/loop.h b/drivers/block/loop.h index 15049e9..c917633 100644 --- a/drivers/block/loop.h +++ b/drivers/block/loop.h @@ -16,6 +16,8 @@ #include <linux/mutex.h> #include <linux/workqueue.h> #include <uapi/linux/loop.h> +#include <linux/aio.h> +#include <linux/scatterlist.h> /* Possible states of device */ enum { @@ -24,6 +26,9 @@ enum { Lo_rundown, }; +#define LOOP_CMD_SEG_CNT 32 +#define LOOP_CMD_BVEC_CNT (LOOP_CMD_SEG_CNT * 4) + struct loop_func_table; struct loop_device { @@ -58,6 +63,7 @@ struct loop_device { struct work_struct write_work; bool write_started; bool use_aio; + bool can_use_aio; int lo_state; struct mutex lo_ctl_mutex; @@ -70,6 +76,10 @@ struct loop_cmd { struct work_struct read_work; struct request *rq; struct list_head list; + bool use_aio; + struct kiocb iocb; + struct bio_vec bv[LOOP_CMD_BVEC_CNT]; + struct bio_vec *alloc_bv; }; /* Support for loadable transfer modules */ -- 1.7.9.5
next prev parent reply other threads:[~2015-01-13 15:46 UTC|newest] Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-01-13 15:44 [PATCH v2 0/4] block & aio: improve loop with kernel aio Ming Lei 2015-01-13 15:44 ` [PATCH v2 1/4] aio: add aio_kernel_() interface Ming Lei 2015-01-25 13:31 ` Christoph Hellwig 2015-01-26 16:18 ` Ming Lei 2015-01-26 17:00 ` Christoph Hellwig 2015-01-27 13:57 ` Ming Lei 2015-01-27 17:59 ` Christoph Hellwig 2015-01-13 15:44 ` [PATCH v2 2/4] fd/direct-io: introduce should_dirty for kernel aio Ming Lei 2015-01-25 13:34 ` Christoph Hellwig 2015-01-27 16:05 ` Ming Lei 2015-01-13 15:44 ` [PATCH v2 3/4] block: loop: introduce 'use_aio' sysfs file Ming Lei 2015-01-25 13:35 ` Christoph Hellwig 2015-01-27 5:26 ` Ming Lei 2015-01-26 17:57 ` Jeff Moyer 2015-01-13 15:44 ` Ming Lei [this message] 2015-01-25 13:40 ` [PATCH v2 4/4] block: loop: support to submit I/O via kernel aio based Christoph Hellwig 2015-03-18 18:28 ` Maxim Patlasov 2015-03-19 2:57 ` Ming Lei 2015-03-19 16:37 ` Maxim Patlasov 2015-03-20 5:27 ` Ming Lei 2015-01-13 16:23 ` [PATCH v2 0/4] block & aio: improve loop with kernel aio Christoph Hellwig 2015-01-14 10:17 ` Ming Lei
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1421163888-21452-5-git-send-email-ming.lei@canonical.com \ --to=ming.lei@canonical.com \ --cc=akpm@linux-foundation.org \ --cc=axboe@kernel.dk \ --cc=bcrl@kvack.org \ --cc=dave.kleikamp@oracle.com \ --cc=hch@infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mpatlasov@parallels.com \ --cc=viro@zeniv.linux.org.uk \ --cc=zab@zabbo.net \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).