LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Christoph Lameter <clameter@sgi.com>,
	linux-kernel@vger.kernel.org, arjan@linux.intel.com,
	mingo@elte.hu, ak@suse.de, jens.axboe@oracle.com,
	James.Bottomley@SteelEye.com, andrea@suse.de,
	akpm@linux-foundation.org, andrew.vasquez@qlogic.com
Subject: Re: [rfc] direct IO submission and completion scalability issues
Date: Wed, 1 Aug 2007 02:41:18 +0200	[thread overview]
Message-ID: <20070801004117.GE31006@wotan.suse.de> (raw)
In-Reply-To: <20070731171403.GL3318@linux-os.sc.intel.com>

On Tue, Jul 31, 2007 at 10:14:03AM -0700, Suresh B wrote:
> On Tue, Jul 31, 2007 at 06:19:17AM +0200, Nick Piggin wrote:
> > On Mon, Jul 30, 2007 at 01:35:19PM -0700, Suresh B wrote:
> > > So any suggestions for making this clean and acceptable to everyone?
> > 
> > It is obviously a good idea to hand over the IO at the point which
> > requires the least number of cachelines to be moved, and I think doing
> > it in the block layer is right. Mostly you have to convince the block
> > and driver maintainers I guess.
> 
> Yes. Implementation is the challenging part I guess.
> 
> > The scheduler really should be made interrupt-load aware anyway, so I
> > don't have a problem with changing that; or scheduling kblockd at a
> > higher priority, but I don't know if SCHED_FIFO is a good idea. Couldn't
> > it be done in a softirq instead?
> 
> Yes, softirq context is one way. But just didn't want to penalize the running
> task by taking away some of its cpu time. With CFS micro accounting, perhaps
> we can track irq, softirq time and avoid penalizing the running task's cpu
> time.

But you "penalize" the running task in the completion handler as well
anyway. Doing this with a SCHED_FIFO task is sort of like doing interrupt
threading which AFAIK has not been accepted (yet).


> > Latency for IO migration could be the most difficult problem to solve
> > really. You don't give much details of the workload, profiles, etc... I
> > hope this is for a real world test?
> 
> Improvement numbers quoted are from the OLTP database workload. We can look
> into other workloads.
> 
> > Can the locking be improved in simpler ways first?
> > 
> > Just some random questions...
> > 
> > It looks like the main source of cacheline bouncing you're eliminating
> > is from the initial starting of IO from an empty queue (ie. unplug).
> > From then on, the submission is driven by completion, right?
> > 
> > Why is the queue allowed to go empty in the first place in an IO critical
> > workload?
> 
> This workload is using direct IO and there is no batching at the block layer
> for direct IO. IO is submitted to the HW as it arrives.

So you aren't putting concurrent requests into the queue? Sounds like
userspace should be improved.


> > Are you loading up each CPU with as many disks as it can possibly handle
> > plus a few more? If so, is that realistic? (I honestly don't know).
> 
> There is 3-4% iowait time in the system. So the cpu's are not 100% busy,
> but there is quite a bit of direct IO going on.
> 
> > You say that you'd like to do this for direct IO only, but if it is more
> > efficient, why not for buffered IO as well? (or is it not more efficient
> > for buffered IO? if not, why?)
> 
> It is applicable for both direct IO and buffered IO. But the implementations
> will differ. For example in buffered IO, we can setup in such a way that the
> block plug timeout function runs on the IO completion cpu.

It would be nice to be doing that anyway. But unplug via request submission
rather than timeout is fairly common in buffered loads too.


> > AFAIKS, you'd still have significant queue_lock contention from other
> > CPUs inserting requests into the list?
> 
> Correct. We have more potential to explore. Current implementation
> is very elementary.
> 
> > What IO scheduler are you using? I assume noop...
> 
> yes.
> 
> > as a crazy experiment, what happens if you create per-cpu request queues?
> 
> or in other words, each kblockd thread catering multiple request queues
> (perhaps one for each cpu or one for group of cpu's).
> 
> softirq context and each kblockd thread handling multiple request queues will
> lead to further improvements.

  reply	other threads:[~2007-08-01  0:41 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-28  1:21 Siddha, Suresh B
2007-07-30 18:20 ` Christoph Lameter
2007-07-30 20:35   ` Siddha, Suresh B
2007-07-31  4:19     ` Nick Piggin
2007-07-31 17:14       ` Siddha, Suresh B
2007-08-01  0:41         ` Nick Piggin [this message]
2007-08-01  0:55           ` Siddha, Suresh B
2007-08-01  1:24             ` Nick Piggin
2008-02-03  9:52 ` Nick Piggin
2008-02-03 10:53   ` Pekka Enberg
2008-02-03 11:58     ` Nick Piggin
2008-02-04  2:10   ` David Chinner
2008-02-04  4:14     ` Arjan van de Ven
2008-02-04  4:40       ` David Chinner
2008-02-04 10:09         ` Nick Piggin
2008-02-05  0:14           ` David Chinner
2008-02-08  7:50             ` Nick Piggin
2008-02-04 18:21     ` Zach Brown
2008-02-04 20:10       ` Jens Axboe
2008-02-04 21:45         ` Arjan van de Ven
2008-02-05  8:24           ` Jens Axboe
2008-02-04 10:12   ` Jens Axboe
2008-02-04 10:31     ` Nick Piggin
2008-02-04 10:33       ` Jens Axboe
2008-02-04 22:28         ` James Bottomley
2008-02-04 10:30   ` Andi Kleen
2008-02-04 21:47   ` Siddha, Suresh B

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070801004117.GE31006@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=James.Bottomley@SteelEye.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@suse.de \
    --cc=andrew.vasquez@qlogic.com \
    --cc=arjan@linux.intel.com \
    --cc=clameter@sgi.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=suresh.b.siddha@intel.com \
    --subject='Re: [rfc] direct IO submission and completion scalability issues' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).