LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>,
	Vladislav Bolkhovitin <vst@vlnb.net>,
	Bart Van Assche <bart.vanassche@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
	linux-scsi@vger.kernel.org, scst-devel@lists.sourceforge.net,
	linux-kernel@vger.kernel.org,
	Mike Christie <michaelc@cs.wisc.edu>,
	CBE-OSS-DEV <cbe-oss-dev@ozlabs.org>
Subject: Re: Integration of SCST in the mainstream Linux kernel
Date: Mon, 04 Feb 2008 11:19:16 -0800	[thread overview]
Message-ID: <1202152756.11265.581.camel@haakon2.linux-iscsi.org> (raw)
In-Reply-To: <1202151989.11265.576.camel@haakon2.linux-iscsi.org>

On Mon, 2008-02-04 at 11:06 -0800, Nicholas A. Bellinger wrote:
> On Mon, 2008-02-04 at 10:29 -0800, Linus Torvalds wrote:
> > 
> > On Mon, 4 Feb 2008, James Bottomley wrote:
> > > 
> > > The way a user space solution should work is to schedule mmapped I/O
> > > from the backing store and then send this mmapped region off for target
> > > I/O.
> > 
> > mmap'ing may avoid the copy, but the overhead of a mmap operation is 
> > quite often much *bigger* than the overhead of a copy operation.
> > 
> > Please do not advocate the use of mmap() as a way to avoid memory copies. 
> > It's not realistic. Even if you can do it with a single "mmap()" system 
> > call (which is not at all a given, considering that block devices can 
> > easily be much larger than the available virtual memory space), the fact 
> > is that page table games along with the fault (and even just TLB miss) 
> > overhead is easily more than the cost of copying a page in a nice 
> > streaming manner.
> > 
> > Yes, memory is "slow", but dammit, so is mmap().
> > 
> > > You also have to pull tricks with the mmap region in the case of writes 
> > > to prevent useless data being read in from the backing store.  However, 
> > > none of this involves data copies.
> > 
> > "data copies" is irrelevant. The only thing that matters is performance. 
> > And if avoiding data copies is more costly (or even of a similar cost) 
> > than the copies themselves would have been, there is absolutely no upside, 
> > and only downsides due to extra complexity.
> > 
> 
> The iSER spec (RFC-5046) quotes the following in the TCP case for direct
> data placement:
> 
> "  Out-of-order TCP segments in the Traditional iSCSI model have to be
>    stored and reassembled before the iSCSI protocol layer within an end
>    node can place the data in the iSCSI buffers.  This reassembly is
>    required because not every TCP segment is likely to contain an iSCSI
>    header to enable its placement, and TCP itself does not have a
>    built-in mechanism for signaling Upper Level Protocol (ULP) message
>    boundaries to aid placement of out-of-order segments.  This TCP
>    reassembly at high network speeds is quite counter-productive for the
>    following reasons: wasted memory bandwidth in data copying, the need
>    for reassembly memory, wasted CPU cycles in data copying, and the
>    general store-and-forward latency from an application perspective."
> 
> While this does not have anything to do directly with the kernel vs. user discussion
> for target mode storage engine, the scaling and latency case is easy enough
> to make if we are talking about scaling TCP for 10 Gb/sec storage fabrics.
> 
> > If you want good performance for a service like this, you really generally 
> > *do* need to in kernel space. You can play games in user space, but you're 
> > fooling yourself if you think you can do as well as doing it in the 
> > kernel. And you're *definitely* fooling yourself if you think mmap() 
> > solves performance issues. "Zero-copy" does not equate to "fast". Memory 
> > speeds may be slower that core CPU speeds, but not infinitely so!
> > 
> 
> >From looking at this problem from a kernel space perspective for a
> number of years, I would be inclined to believe this is true for
> software and hardware data-path cases.  The benefits of moving various
> control statemachines for something like say traditional iSCSI to
> userspace has always been debateable.  The most obvious ones are things
> like authentication, espically if something more complex than CHAP are
> the obvious case for userspace.  However, I have thought recovery for
> failures caused from communication path (iSCSI connections) or entire
> nexuses (iSCSI sessions) failures was very problematic to expect to have
> to potentially push down IOs state to userspace.
> 
> Keeping statemachines for protocol and/or fabric specific statemachines
> (CSM-E and CSM-I from connection recovery in iSCSI and iSER are the
> obvious ones) are the best canidates for residing in kernel space.
> 
> > (That said: there *are* alternatives to mmap, like "splice()", that really 
> > do potentially solve some issues without the page table and TLB overheads. 
> > But while splice() avoids the costs of paging, I strongly suspect it would 
> > still have easily measurable latency issues. Switching between user and 
> > kernel space multiple times is definitely not going to be free, although 
> > it's probably not a huge issue if you have big enough requests).
> > 
> 

Then again, having some data-path for software and hardware bulk IO
operation of storage fabric protocol / statemachine in userspace would
be really interesting for something like an SPU enabled engine for the
Cell Broadband Architecture.

--nab




  reply	other threads:[~2008-02-04 19:20 UTC|newest]

Thread overview: 147+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-23 14:22 Bart Van Assche
2008-01-23 17:11 ` Vladislav Bolkhovitin
2008-01-29 20:42 ` James Bottomley
2008-01-29 21:31   ` Roland Dreier
2008-01-29 23:32     ` FUJITA Tomonori
2008-01-30  1:15       ` [Scst-devel] " Vu Pham
2008-01-30  8:38       ` Bart Van Assche
2008-01-30 10:56         ` FUJITA Tomonori
2008-01-30 11:40           ` Vladislav Bolkhovitin
2008-01-30 13:10           ` Bart Van Assche
2008-01-30 13:54             ` FUJITA Tomonori
2008-01-31  7:48               ` Bart Van Assche
2008-01-31 13:25           ` Nicholas A. Bellinger
2008-01-31 14:34             ` Bart Van Assche
2008-01-31 14:44               ` Nicholas A. Bellinger
2008-01-31 15:50               ` Vladislav Bolkhovitin
2008-01-31 16:25                 ` [Scst-devel] " Joe Landman
2008-01-31 17:08                   ` Bart Van Assche
2008-01-31 17:13                     ` Joe Landman
2008-01-31 18:12                     ` David Dillow
2008-02-01 11:50                       ` Vladislav Bolkhovitin
2008-02-01 11:50                     ` Vladislav Bolkhovitin
2008-02-01 12:25                       ` Vladislav Bolkhovitin
2008-01-31 17:14                 ` Nicholas A. Bellinger
2008-01-31 17:40                   ` Bart Van Assche
2008-01-31 18:15                     ` Nicholas A. Bellinger
2008-02-01  9:08                       ` Bart Van Assche
2008-02-01  8:11             ` Bart Van Assche
2008-02-01 10:39               ` Nicholas A. Bellinger
2008-02-01 11:04                 ` Bart Van Assche
2008-02-01 12:05                   ` Nicholas A. Bellinger
2008-02-01 13:25                     ` Bart Van Assche
2008-02-01 14:36                       ` Nicholas A. Bellinger
2008-01-30 16:34         ` James Bottomley
2008-01-30 16:50           ` Bart Van Assche
2008-02-02 15:32           ` Pete Wyckoff
2008-02-05 17:01         ` Erez Zilber
2008-02-06 12:16           ` Bart Van Assche
2008-02-06 16:45             ` Benny Halevy
2008-02-06 17:06             ` Roland Dreier
2008-02-18  9:43             ` Erez Zilber
2008-02-18 11:01               ` Bart Van Assche
2008-02-20  7:34                 ` Erez Zilber
2008-02-20  8:41                   ` Bart Van Assche
2008-01-30 11:18       ` Vladislav Bolkhovitin
2008-01-30  8:29   ` Bart Van Assche
2008-01-30 16:22     ` James Bottomley
2008-01-30 17:03       ` Bart Van Assche
2008-02-05  7:14       ` [Scst-devel] " Tomasz Chmielewski
2008-02-05 13:38         ` FUJITA Tomonori
2008-02-05 16:07           ` Tomasz Chmielewski
2008-02-05 16:21             ` Ming Zhang
2008-02-05 16:43             ` FUJITA Tomonori
2008-02-05 17:09           ` Matteo Tescione
2008-02-06  1:29             ` FUJITA Tomonori
2008-02-06  2:01               ` Nicholas A. Bellinger
2008-01-30 11:17   ` Vladislav Bolkhovitin
2008-02-04 12:27     ` Vladislav Bolkhovitin
2008-02-04 13:53       ` Bart Van Assche
2008-02-04 17:00         ` David Dillow
2008-02-04 17:08         ` Vladislav Bolkhovitin
2008-02-05 16:25         ` Bart Van Assche
2008-02-05 18:18           ` Linus Torvalds
2008-02-04 15:30       ` James Bottomley
2008-02-04 16:25         ` Vladislav Bolkhovitin
2008-02-04 17:06           ` James Bottomley
2008-02-04 17:16             ` Vladislav Bolkhovitin
2008-02-04 17:25               ` James Bottomley
2008-02-04 17:56                 ` Vladislav Bolkhovitin
2008-02-04 18:22                   ` James Bottomley
2008-02-04 18:38                     ` Vladislav Bolkhovitin
2008-02-04 18:54                       ` James Bottomley
2008-02-05 18:59                         ` Vladislav Bolkhovitin
2008-02-05 19:13                           ` James Bottomley
2008-02-06 18:07                             ` Vladislav Bolkhovitin
2008-02-07 13:13                             ` [Scst-devel] " Bart Van Assche
2008-02-07 13:45                               ` Vladislav Bolkhovitin
2008-02-07 22:51                                 ` david
2008-02-08 10:37                                   ` Vladislav Bolkhovitin
2008-02-09  7:40                                     ` david
2008-02-08 11:33                                   ` Nicholas A. Bellinger
2008-02-08 14:36                                     ` Vladislav Bolkhovitin
2008-02-08 23:53                                       ` Nicholas A. Bellinger
2008-02-15 15:02                                 ` Bart Van Assche
2008-02-07 15:38                               ` [Scst-devel] " Nicholas A. Bellinger
2008-02-07 20:37                                 ` Luben Tuikov
2008-02-08 10:32                                   ` Vladislav Bolkhovitin
2008-02-09  7:32                                     ` Luben Tuikov
2008-02-11 10:02                                       ` Vladislav Bolkhovitin
2008-02-08 11:53                                   ` [Scst-devel] " Nicholas A. Bellinger
2008-02-08 14:42                                     ` Vladislav Bolkhovitin
2008-02-09  0:00                                       ` Nicholas A. Bellinger
2008-02-04 18:29                 ` Linus Torvalds
2008-02-04 18:49                   ` James Bottomley
2008-02-04 19:06                   ` Nicholas A. Bellinger
2008-02-04 19:19                     ` Nicholas A. Bellinger [this message]
2008-02-04 19:44                     ` Linus Torvalds
2008-02-04 20:06                       ` [Scst-devel] " 4news
2008-02-04 20:24                       ` Nicholas A. Bellinger
2008-02-04 21:01                       ` J. Bruce Fields
2008-02-04 21:24                         ` Linus Torvalds
2008-02-04 22:00                           ` Nicholas A. Bellinger
2008-02-04 22:57                           ` Jeff Garzik
2008-02-04 23:45                             ` Linus Torvalds
2008-02-05  0:08                               ` Jeff Garzik
2008-02-05  1:20                                 ` Linus Torvalds
2008-02-05  8:38                             ` Bart Van Assche
2008-02-05 17:50                               ` Jeff Garzik
2008-02-06 10:22                                 ` Bart Van Assche
2008-02-06 14:21                                   ` Jeff Garzik
2008-02-05 13:05                             ` Olivier Galibert
2008-02-05 18:08                               ` Jeff Garzik
2008-02-05 19:01                           ` Vladislav Bolkhovitin
2008-02-04 22:43                       ` Alan Cox
2008-02-04 17:30                         ` Douglas Gilbert
2008-02-05  2:07                           ` [Scst-devel] " Chris Weiss
2008-02-05 14:19                             ` FUJITA Tomonori
2008-02-04 22:59                         ` Nicholas A. Bellinger
2008-02-04 23:00                         ` James Bottomley
2008-02-04 23:12                           ` Nicholas A. Bellinger
2008-02-04 23:16                             ` Nicholas A. Bellinger
2008-02-05 18:37                             ` James Bottomley
2008-02-04 23:04                         ` Jeff Garzik
2008-02-04 23:27                           ` Linus Torvalds
2008-02-05 19:01                           ` Vladislav Bolkhovitin
2008-02-05 19:12                             ` Jeff Garzik
2008-02-05 19:21                               ` Vladislav Bolkhovitin
2008-02-06  0:11                                 ` Nicholas A. Bellinger
2008-02-06  1:43                                   ` Nicholas A. Bellinger
2008-02-12 16:05                                   ` [Scst-devel] " Bart Van Assche
2008-02-13  3:44                                     ` Nicholas A. Bellinger
2008-02-13  6:18                                       ` CONFIG_SLUB and reproducable general protection faults on 2.6.2x Nicholas A. Bellinger
2008-02-13 16:37                                         ` Nicholas A. Bellinger
2008-02-06  0:17                               ` Integration of SCST in the mainstream Linux kernel Nicholas A. Bellinger
2008-02-06  0:48                             ` Nicholas A. Bellinger
2008-02-06  0:51                               ` Nicholas A. Bellinger
2008-02-05  0:07                         ` Matt Mackall
2008-02-05  0:24                           ` Linus Torvalds
2008-02-05  0:42                             ` Jeff Garzik
2008-02-05  0:45                             ` Matt Mackall
2008-02-05  4:43                             ` [Scst-devel] " Matteo Tescione
2008-02-05  5:07                               ` James Bottomley
2008-02-05 13:38                               ` FUJITA Tomonori
2008-02-05 19:00                       ` Vladislav Bolkhovitin
2008-02-05 17:10 ` Erez Zilber
2008-02-05 19:02   ` Bart Van Assche
2008-02-05 19:02   ` Vladislav Bolkhovitin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1202152756.11265.581.camel@haakon2.linux-iscsi.org \
    --to=nab@linux-iscsi.org \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=bart.vanassche@gmail.com \
    --cc=cbe-oss-dev@ozlabs.org \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=michaelc@cs.wisc.edu \
    --cc=scst-devel@lists.sourceforge.net \
    --cc=torvalds@linux-foundation.org \
    --cc=vst@vlnb.net \
    --subject='Re: Integration of SCST in the mainstream Linux kernel' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).