LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA
  2018-05-18  0:22 [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA Long Li
@ 2018-05-17 23:10 ` Tom Talpey
  2018-05-18  6:03   ` Long Li
  2018-05-18  6:42   ` Christoph Hellwig
  2018-05-18  0:22 ` [RFC PATCH 01/09] Introduce offset for the 1st page in data transfer structures Long Li
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 24+ messages in thread
From: Tom Talpey @ 2018-05-17 23:10 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 5/17/2018 8:22 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> This patchset implements direct user I/O through RDMA.
> 
> In normal code path (even with cache=none), CIFS copies I/O data from
> user-space to kernel-space for security reasons.
> 
> With this patchset, a new mounting option is introduced to have CIFS pin the
> user-space buffer into memory and performs I/O through RDMA. This avoids memory
> copy, at the cost of added security risk.

What's the security risk? This type of direct i/o behavior is not
uncommon, and can certainly be made safe, using the appropriate
memory registration and protection domains. Any risk needs to be
stated explicitly, and mitigation provided, or at least described.

Tom.

> 
> This patchset is RFC. The work is in progress, do not merge.
> 
> 
> Long Li (9):
>    Introduce offset for the 1st page in data transfer structures
>    Change wdata alloc to support direct pages
>    Change rdata alloc to support direct pages
>    Change function to support offset when reading pages
>    Change RDMA send to regonize page offset in the 1st page
>    Change RDMA recv to support offset in the 1st page
>    Support page offset in memory regsitrations
>    Implement no-copy file I/O interfaces
>    Introduce cache=rdma moutning option
>   
> 
>   fs/cifs/cifs_fs_sb.h      |   2 +
>   fs/cifs/cifsfs.c          |  19 +++
>   fs/cifs/cifsfs.h          |   3 +
>   fs/cifs/cifsglob.h        |   6 +
>   fs/cifs/cifsproto.h       |   4 +-
>   fs/cifs/cifssmb.c         |  10 +-
>   fs/cifs/connect.c         |  13 +-
>   fs/cifs/dir.c             |   5 +
>   fs/cifs/file.c            | 351 ++++++++++++++++++++++++++++++++++++++++++----
>   fs/cifs/inode.c           |   4 +-
>   fs/cifs/smb2ops.c         |   2 +-
>   fs/cifs/smb2pdu.c         |  22 ++-
>   fs/cifs/smbdirect.c       | 132 ++++++++++-------
>   fs/cifs/smbdirect.h       |   2 +-
>   fs/read_write.c           |   7 +
>   include/linux/ratelimit.h |   2 +-
>   16 files changed, 489 insertions(+), 95 deletions(-)
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA
@ 2018-05-18  0:22 Long Li
  2018-05-17 23:10 ` Tom Talpey
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Long Li @ 2018-05-18  0:22 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

This patchset implements direct user I/O through RDMA.

In normal code path (even with cache=none), CIFS copies I/O data from
user-space to kernel-space for security reasons.

With this patchset, a new mounting option is introduced to have CIFS pin the 
user-space buffer into memory and performs I/O through RDMA. This avoids memory
copy, at the cost of added security risk.

This patchset is RFC. The work is in progress, do not merge.


Long Li (9):
  Introduce offset for the 1st page in data transfer structures
  Change wdata alloc to support direct pages
  Change rdata alloc to support direct pages
  Change function to support offset when reading pages
  Change RDMA send to regonize page offset in the 1st page
  Change RDMA recv to support offset in the 1st page
  Support page offset in memory regsitrations
  Implement no-copy file I/O interfaces
  Introduce cache=rdma moutning option
 

 fs/cifs/cifs_fs_sb.h      |   2 +
 fs/cifs/cifsfs.c          |  19 +++
 fs/cifs/cifsfs.h          |   3 +
 fs/cifs/cifsglob.h        |   6 +
 fs/cifs/cifsproto.h       |   4 +-
 fs/cifs/cifssmb.c         |  10 +-
 fs/cifs/connect.c         |  13 +-
 fs/cifs/dir.c             |   5 +
 fs/cifs/file.c            | 351 ++++++++++++++++++++++++++++++++++++++++++----
 fs/cifs/inode.c           |   4 +-
 fs/cifs/smb2ops.c         |   2 +-
 fs/cifs/smb2pdu.c         |  22 ++-
 fs/cifs/smbdirect.c       | 132 ++++++++++-------
 fs/cifs/smbdirect.h       |   2 +-
 fs/read_write.c           |   7 +
 include/linux/ratelimit.h |   2 +-
 16 files changed, 489 insertions(+), 95 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 01/09] Introduce offset for the 1st page in data transfer structures
  2018-05-18  0:22 [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA Long Li
  2018-05-17 23:10 ` Tom Talpey
@ 2018-05-18  0:22 ` Long Li
  2018-05-18  6:37   ` Steve French
  2018-05-18  0:22 ` [RFC PATCH 02/09] Change wdata alloc to support direct pages Long Li
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Long Li @ 2018-05-18  0:22 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

Currently CIFS allocates its own pages for data transfer, they don't need offset
since it's always 0 in the 1st page.

Direct data transfer needs to define an offset because user-data may not start
on the page boundary

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsglob.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index cb950a5..a51855c 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -176,6 +176,7 @@ struct smb_rqst {
 	struct kvec	*rq_iov;	/* array of kvecs */
 	unsigned int	rq_nvec;	/* number of kvecs in array */
 	struct page	**rq_pages;	/* pointer to array of page ptrs */
+	unsigned int	rq_offset;	/* the offset to the 1st page */
 	unsigned int	rq_npages;	/* number pages in array */
 	unsigned int	rq_pagesz;	/* page size to use */
 	unsigned int	rq_tailsz;	/* length of last page */
@@ -1167,8 +1168,10 @@ struct cifs_readdata {
 	struct kvec			iov[2];
 #ifdef CONFIG_CIFS_SMB_DIRECT
 	struct smbd_mr			*mr;
+	struct page			**direct_pages;
 #endif
 	unsigned int			pagesz;
+	unsigned int			page_offset;
 	unsigned int			tailsz;
 	unsigned int			credits;
 	unsigned int			nr_pages;
@@ -1192,8 +1195,10 @@ struct cifs_writedata {
 	int				result;
 #ifdef CONFIG_CIFS_SMB_DIRECT
 	struct smbd_mr			*mr;
+	struct page			**direct_pages;
 #endif
 	unsigned int			pagesz;
+	unsigned int			page_offset;
 	unsigned int			tailsz;
 	unsigned int			credits;
 	unsigned int			nr_pages;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 02/09] Change wdata alloc to support direct pages
  2018-05-18  0:22 [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA Long Li
  2018-05-17 23:10 ` Tom Talpey
  2018-05-18  0:22 ` [RFC PATCH 01/09] Introduce offset for the 1st page in data transfer structures Long Li
@ 2018-05-18  0:22 ` Long Li
  2018-05-19  1:05   ` Tom Talpey
  2018-05-18  0:22 ` [RFC PATCH 03/09] Change rdata " Long Li
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Long Li @ 2018-05-18  0:22 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

When using direct pages from user space, there is no need to allocate pages.

Just ping those user pages for RDMA.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsproto.h |  2 +-
 fs/cifs/cifssmb.c   | 10 +++++++---
 fs/cifs/file.c      |  4 ++--
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 365a414..94106b9 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -523,7 +523,7 @@ int cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid);
 int cifs_async_writev(struct cifs_writedata *wdata,
 		      void (*release)(struct kref *kref));
 void cifs_writev_complete(struct work_struct *work);
-struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages,
+struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages, struct page **direct_pages,
 						work_func_t complete);
 void cifs_writedata_release(struct kref *refcount);
 int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 1529a08..3b1731d 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -1983,7 +1983,7 @@ cifs_writev_requeue(struct cifs_writedata *wdata)
 			tailsz = rest_len - (nr_pages - 1) * PAGE_SIZE;
 		}
 
-		wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete);
+		wdata2 = cifs_writedata_alloc(nr_pages, NULL, cifs_writev_complete);
 		if (!wdata2) {
 			rc = -ENOMEM;
 			break;
@@ -2067,12 +2067,16 @@ cifs_writev_complete(struct work_struct *work)
 }
 
 struct cifs_writedata *
-cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
+cifs_writedata_alloc(unsigned int nr_pages, struct page **direct_pages, work_func_t complete)
 {
 	struct cifs_writedata *wdata;
 
 	/* writedata + number of page pointers */
-	wdata = kzalloc(sizeof(*wdata) +
+	if (direct_pages) {
+		wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
+		wdata->direct_pages = direct_pages;
+	} else
+		wdata = kzalloc(sizeof(*wdata) +
 			sizeof(struct page *) * nr_pages, GFP_NOFS);
 	if (wdata != NULL) {
 		kref_init(&wdata->refcount);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 23fd430..a6ec896 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1965,7 +1965,7 @@ wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping,
 {
 	struct cifs_writedata *wdata;
 
-	wdata = cifs_writedata_alloc((unsigned int)tofind,
+	wdata = cifs_writedata_alloc((unsigned int)tofind, NULL,
 				     cifs_writev_complete);
 	if (!wdata)
 		return NULL;
@@ -2554,7 +2554,7 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
 			break;
 
 		nr_pages = get_numpages(wsize, len, &cur_len);
-		wdata = cifs_writedata_alloc(nr_pages,
+		wdata = cifs_writedata_alloc(nr_pages, NULL,
 					     cifs_uncached_writev_complete);
 		if (!wdata) {
 			rc = -ENOMEM;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 03/09] Change rdata alloc to support direct pages
  2018-05-18  0:22 [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA Long Li
                   ` (2 preceding siblings ...)
  2018-05-18  0:22 ` [RFC PATCH 02/09] Change wdata alloc to support direct pages Long Li
@ 2018-05-18  0:22 ` Long Li
  2018-05-18  0:22 ` [RFC PATCH 04/09] Change function to support offset when reading pages Long Li
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Long Li @ 2018-05-18  0:22 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

There is no need to allocate pages when using pages directly from user buffer

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/file.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index a6ec896..ed25e04 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2880,11 +2880,15 @@ cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from)
 }
 
 static struct cifs_readdata *
-cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
+cifs_readdata_alloc(unsigned int nr_pages, struct page **direct_pages, work_func_t complete)
 {
 	struct cifs_readdata *rdata;
 
-	rdata = kzalloc(sizeof(*rdata) + (sizeof(struct page *) * nr_pages),
+	if (direct_pages) {
+		rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
+		rdata->direct_pages = direct_pages;
+	} else
+		rdata = kzalloc(sizeof(*rdata) + (sizeof(struct page *) * nr_pages),
 			GFP_KERNEL);
 	if (rdata != NULL) {
 		kref_init(&rdata->refcount);
@@ -3095,14 +3099,13 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 		npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
 
 		/* allocate a readdata struct */
-		rdata = cifs_readdata_alloc(npages,
+		rdata = cifs_readdata_alloc(npages, NULL,
 					    cifs_uncached_readv_complete);
 		if (!rdata) {
 			add_credits_and_wake_if(server, credits, 0);
 			rc = -ENOMEM;
 			break;
 		}
-
 		rc = cifs_read_allocate_pages(rdata, npages);
 		if (rc)
 			goto error;
@@ -3770,7 +3773,7 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
 			break;
 		}
 
-		rdata = cifs_readdata_alloc(nr_pages, cifs_readv_complete);
+		rdata = cifs_readdata_alloc(nr_pages, NULL, cifs_readv_complete);
 		if (!rdata) {
 			/* best to give up if we're out of mem */
 			list_for_each_entry_safe(page, tpage, &tmplist, lru) {
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 04/09] Change function to support offset when reading pages
  2018-05-18  0:22 [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA Long Li
                   ` (3 preceding siblings ...)
  2018-05-18  0:22 ` [RFC PATCH 03/09] Change rdata " Long Li
@ 2018-05-18  0:22 ` Long Li
  2018-05-18  0:22 ` [RFC PATCH 05/09] Change RDMA send to regonize page offset in the 1st page Long Li
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Long Li @ 2018-05-18  0:22 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

It's possible that we may want to read data into an offset to the 1st page, change
the functions to pass the offset to transport.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsproto.h | 2 +-
 fs/cifs/connect.c   | 4 ++--
 fs/cifs/file.c      | 4 ++--
 fs/cifs/smb2ops.c   | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 94106b9..cc034e2 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -197,7 +197,7 @@ extern void dequeue_mid(struct mid_q_entry *mid, bool malformed);
 extern int cifs_read_from_socket(struct TCP_Server_Info *server, char *buf,
 			         unsigned int to_read);
 extern int cifs_read_page_from_socket(struct TCP_Server_Info *server,
-				      struct page *page, unsigned int to_read);
+				      struct page *page, unsigned int page_offset, unsigned int to_read);
 extern int cifs_setup_cifs_sb(struct smb_vol *pvolume_info,
 			       struct cifs_sb_info *cifs_sb);
 extern int cifs_match_super(struct super_block *, void *);
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 58c2083..46d0cf4 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -591,11 +591,11 @@ cifs_read_from_socket(struct TCP_Server_Info *server, char *buf,
 }
 
 int
-cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page,
+cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page, unsigned int page_offset,
 		      unsigned int to_read)
 {
 	struct msghdr smb_msg;
-	struct bio_vec bv = {.bv_page = page, .bv_len = to_read};
+	struct bio_vec bv = {.bv_page = page, .bv_len = to_read, .bv_offset = page_offset};
 	iov_iter_bvec(&smb_msg.msg_iter, READ | ITER_BVEC, &bv, 1, to_read);
 	return cifs_readv_from_socket(server, &smb_msg);
 }
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index ed25e04..e240c7c 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3044,7 +3044,7 @@ uncached_fill_pages(struct TCP_Server_Info *server,
 			result = n;
 #endif
 		else
-			result = cifs_read_page_from_socket(server, page, n);
+			result = cifs_read_page_from_socket(server, page, page_offset, n);
 		if (result < 0)
 			break;
 
@@ -3614,7 +3614,7 @@ readpages_fill_pages(struct TCP_Server_Info *server,
 			result = n;
 #endif
 		else
-			result = cifs_read_page_from_socket(server, page, n);
+			result = cifs_read_page_from_socket(server, page, 0, n);
 		if (result < 0)
 			break;
 
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index b76b858..f890dd7 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -2387,7 +2387,7 @@ read_data_into_pages(struct TCP_Server_Info *server, struct page **pages,
 			zero_user(page, len, PAGE_SIZE - len);
 			len = 0;
 		}
-		length = cifs_read_page_from_socket(server, page, n);
+		length = cifs_read_page_from_socket(server, page, 0, n);
 		if (length < 0)
 			return length;
 		server->total_read += length;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 05/09] Change RDMA send to regonize page offset in the 1st page
  2018-05-18  0:22 [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA Long Li
                   ` (4 preceding siblings ...)
  2018-05-18  0:22 ` [RFC PATCH 04/09] Change function to support offset when reading pages Long Li
@ 2018-05-18  0:22 ` Long Li
  2018-05-19  1:09   ` Tom Talpey
  2018-05-18  0:22 ` [RFC PATCH 06/09] Change RDMA recv to support " Long Li
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Long Li @ 2018-05-18  0:22 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

When doing RDMA send, the offset needs to be checked as data may start in an offset
in the 1st page.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/smb2pdu.c   |  3 ++-
 fs/cifs/smbdirect.c | 25 +++++++++++++++++++------
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 5097f28..fdcf97e 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -3015,7 +3015,8 @@ smb2_async_writev(struct cifs_writedata *wdata,
 
 	rqst.rq_iov = iov;
 	rqst.rq_nvec = 2;
-	rqst.rq_pages = wdata->pages;
+	rqst.rq_pages = wdata->direct_pages ? wdata->direct_pages : wdata->pages;
+	rqst.rq_offset = wdata->page_offset;
 	rqst.rq_npages = wdata->nr_pages;
 	rqst.rq_pagesz = wdata->pagesz;
 	rqst.rq_tailsz = wdata->tailsz;
diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index b0a1955..b46586d 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2084,8 +2084,10 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
 
 	/* add in the page array if there is one */
 	if (rqst->rq_npages) {
-		buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
-		buflen += rqst->rq_tailsz;
+		if (rqst->rq_npages == 1)
+			buflen += rqst->rq_tailsz;
+		else
+			buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) - rqst->rq_offset + rqst->rq_tailsz;
 	}
 
 	if (buflen + sizeof(struct smbd_data_transfer) >
@@ -2182,8 +2184,19 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
 
 	/* now sending pages if there are any */
 	for (i = 0; i < rqst->rq_npages; i++) {
-		buflen = (i == rqst->rq_npages-1) ?
-			rqst->rq_tailsz : rqst->rq_pagesz;
+		unsigned int offset = 0;
+		if (i == 0)
+			offset = rqst->rq_offset;
+		if (rqst->rq_npages == 1 || i == rqst->rq_npages-1)
+			buflen = rqst->rq_tailsz;
+		else {
+			/* We have at least two pages, and this is not the last page */
+			if (i == 0)
+				buflen = rqst->rq_pagesz - rqst->rq_offset;
+			else
+				buflen = rqst->rq_pagesz;
+		}
+
 		nvecs = (buflen + max_iov_size - 1) / max_iov_size;
 		log_write(INFO, "sending pages buflen=%d nvecs=%d\n",
 			buflen, nvecs);
@@ -2194,9 +2207,9 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
 			remaining_data_length -= size;
 			log_write(INFO, "sending pages i=%d offset=%d size=%d"
 				" remaining_data_length=%d\n",
-				i, j*max_iov_size, size, remaining_data_length);
+				i, j*max_iov_size+offset, size, remaining_data_length);
 			rc = smbd_post_send_page(
-				info, rqst->rq_pages[i], j*max_iov_size,
+				info, rqst->rq_pages[i], j*max_iov_size + offset,
 				size, remaining_data_length);
 			if (rc)
 				goto done;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 06/09] Change RDMA recv to support offset in the 1st page
  2018-05-18  0:22 [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA Long Li
                   ` (5 preceding siblings ...)
  2018-05-18  0:22 ` [RFC PATCH 05/09] Change RDMA send to regonize page offset in the 1st page Long Li
@ 2018-05-18  0:22 ` Long Li
  2018-05-18  0:22 ` [RFC PATCH 07/09] Support page offset in memory regsitrations Long Li
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Long Li @ 2018-05-18  0:22 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

The actaul data buffer may start with an offset in the 1st page, modify RDMA recv
function to read the data to the correct buffer.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/smbdirect.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index b46586d..939c289 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -1978,7 +1978,7 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
  * return value: actual data read
  */
 static int smbd_recv_page(struct smbd_connection *info,
-		struct page *page, unsigned int to_read)
+		struct page *page, unsigned int page_offset, unsigned int to_read)
 {
 	int ret;
 	char *to_address;
@@ -1989,10 +1989,10 @@ static int smbd_recv_page(struct smbd_connection *info,
 		info->reassembly_data_length >= to_read ||
 			info->transport_status != SMBD_CONNECTED);
 	if (ret)
-		return 0;
+		return ret;
 
 	/* now we can read from reassembly queue and not sleep */
-	to_address = kmap_atomic(page);
+	to_address = (char *) kmap_atomic(page) + page_offset;
 
 	log_read(INFO, "reading from page=%p address=%p to_read=%d\n",
 		page, to_address, to_read);
@@ -2012,7 +2012,7 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
 {
 	char *buf;
 	struct page *page;
-	unsigned int to_read;
+	unsigned int to_read, page_offset;
 	int rc;
 
 	switch (msg->msg_iter.type) {
@@ -2024,15 +2024,16 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
 
 	case READ | ITER_BVEC:
 		page = msg->msg_iter.bvec->bv_page;
+		page_offset = msg->msg_iter.bvec->bv_offset;
 		to_read = msg->msg_iter.bvec->bv_len;
-		rc = smbd_recv_page(info, page, to_read);
+		rc = smbd_recv_page(info, page, page_offset, to_read);
 		break;
 
 	default:
 		/* It's a bug in upper layer to get there */
 		cifs_dbg(VFS, "CIFS: invalid msg type %d\n",
 			msg->msg_iter.type);
-		rc = -EIO;
+		rc = -EINVAL;
 	}
 
 	/* SMBDirect will read it all or nothing */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 07/09] Support page offset in memory regsitrations
  2018-05-18  0:22 [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA Long Li
                   ` (6 preceding siblings ...)
  2018-05-18  0:22 ` [RFC PATCH 06/09] Change RDMA recv to support " Long Li
@ 2018-05-18  0:22 ` Long Li
  2018-05-18  0:22 ` [RFC PATCH 08/09] Implement direct file I/O interfaces Long Li
  2018-05-18  0:22 ` [RFC PATCH 09/09] Introduce cache=rdma moutning option Long Li
  9 siblings, 0 replies; 24+ messages in thread
From: Long Li @ 2018-05-18  0:22 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

Now momory registration needs to recognize offset in page when direct transfer is
used.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/smb2pdu.c   | 19 +++++++++-----
 fs/cifs/smbdirect.c | 74 +++++++++++++++++++++++++++++++----------------------
 fs/cifs/smbdirect.h |  2 +-
 3 files changed, 57 insertions(+), 38 deletions(-)

diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index fdcf97e..a23c7cb 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -2586,6 +2586,8 @@ smb2_new_read_req(void **buf, unsigned int *total_len,
 	req->MinimumCount = 0;
 	req->Length = cpu_to_le32(io_parms->length);
 	req->Offset = cpu_to_le64(io_parms->offset);
+	if (!rdata->tailsz)
+		rdata->tailsz = rdata->pagesz;
 #ifdef CONFIG_CIFS_SMB_DIRECT
 	/*
 	 * If we want to do a RDMA write, fill in and append
@@ -2601,8 +2603,8 @@ smb2_new_read_req(void **buf, unsigned int *total_len,
 		rcu_read_lock();
 		rcu_dereference(server->smbd_conn);
 		rdata->mr = smbd_register_mr(
-				server->smbd_conn, rdata->pages,
-				rdata->nr_pages, rdata->tailsz,
+				server->smbd_conn, rdata->direct_pages ? rdata->direct_pages : rdata->pages,
+				rdata->nr_pages, rdata->page_offset, rdata->tailsz,
 				true, need_invalidate);
 		rcu_read_unlock();
 		if (!rdata->mr)
@@ -2967,6 +2969,8 @@ smb2_async_writev(struct cifs_writedata *wdata,
 	req->DataOffset = cpu_to_le16(
 				offsetof(struct smb2_write_req, Buffer));
 	req->RemainingBytes = 0;
+	if (!wdata->tailsz)
+		wdata->tailsz = wdata->pagesz;
 #ifdef CONFIG_CIFS_SMB_DIRECT
 	/*
 	 * If we want to do a server RDMA read, fill in and append
@@ -2981,8 +2985,8 @@ smb2_async_writev(struct cifs_writedata *wdata,
 		rcu_read_lock();
 		rcu_dereference(server->smbd_conn);
 		wdata->mr = smbd_register_mr(
-				server->smbd_conn, wdata->pages,
-				wdata->nr_pages, wdata->tailsz,
+				server->smbd_conn, wdata->direct_pages ? wdata->direct_pages : wdata->pages,
+				wdata->nr_pages, wdata->page_offset, wdata->tailsz,
 				false, need_invalidate);
 		rcu_read_unlock();
 		if (!wdata->mr) {
@@ -2991,8 +2995,11 @@ smb2_async_writev(struct cifs_writedata *wdata,
 		}
 		req->Length = 0;
 		req->DataOffset = 0;
-		req->RemainingBytes =
-			cpu_to_le32((wdata->nr_pages-1)*PAGE_SIZE + wdata->tailsz);
+		if (wdata->nr_pages > 1)
+			req->RemainingBytes =
+				cpu_to_le32((wdata->nr_pages-1)*PAGE_SIZE - wdata->page_offset + wdata->tailsz);
+		else
+			req->RemainingBytes = cpu_to_le32(wdata->tailsz);
 		req->Channel = SMB2_CHANNEL_RDMA_V1_INVALIDATE;
 		if (need_invalidate)
 			req->Channel = SMB2_CHANNEL_RDMA_V1;
diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index 939c289..bce1e7a 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2267,37 +2267,37 @@ static void smbd_mr_recovery_work(struct work_struct *work)
 		if (smbdirect_mr->state == MR_INVALIDATED ||
 			smbdirect_mr->state == MR_ERROR) {
 
-			if (smbdirect_mr->state == MR_INVALIDATED) {
+			/* recover this MR entry */
+			rc = ib_dereg_mr(smbdirect_mr->mr);
+			if (rc) {
+				log_rdma_mr(ERR,
+					"ib_dereg_mr failed rc=%x\n",
+					rc);
+				smbd_disconnect_rdma_connection(info);
+				continue;
+			}
+
+			smbdirect_mr->mr = ib_alloc_mr(
+				info->pd, info->mr_type,
+				info->max_frmr_depth);
+			if (IS_ERR(smbdirect_mr->mr)) {
+				log_rdma_mr(ERR,
+					"ib_alloc_mr failed mr_type=%x "
+					"max_frmr_depth=%x\n",
+					info->mr_type,
+					info->max_frmr_depth);
+				smbd_disconnect_rdma_connection(info);
+				continue;
+			}
+
+			if (smbdirect_mr->state == MR_INVALIDATED)
 				ib_dma_unmap_sg(
 					info->id->device, smbdirect_mr->sgl,
 					smbdirect_mr->sgl_count,
 					smbdirect_mr->dir);
-				smbdirect_mr->state = MR_READY;
-			} else if (smbdirect_mr->state == MR_ERROR) {
-
-				/* recover this MR entry */
-				rc = ib_dereg_mr(smbdirect_mr->mr);
-				if (rc) {
-					log_rdma_mr(ERR,
-						"ib_dereg_mr failed rc=%x\n",
-						rc);
-					smbd_disconnect_rdma_connection(info);
-				}
 
-				smbdirect_mr->mr = ib_alloc_mr(
-					info->pd, info->mr_type,
-					info->max_frmr_depth);
-				if (IS_ERR(smbdirect_mr->mr)) {
-					log_rdma_mr(ERR,
-						"ib_alloc_mr failed mr_type=%x "
-						"max_frmr_depth=%x\n",
-						info->mr_type,
-						info->max_frmr_depth);
-					smbd_disconnect_rdma_connection(info);
-				}
+			smbdirect_mr->state = MR_READY;
 
-				smbdirect_mr->state = MR_READY;
-			}
 			/* smbdirect_mr->state is updated by this function
 			 * and is read and updated by I/O issuing CPUs trying
 			 * to get a MR, the call to atomic_inc_return
@@ -2443,7 +2443,7 @@ static struct smbd_mr *get_mr(struct smbd_connection *info)
  */
 struct smbd_mr *smbd_register_mr(
 	struct smbd_connection *info, struct page *pages[], int num_pages,
-	int tailsz, bool writing, bool need_invalidate)
+	int offset, int tailsz, bool writing, bool need_invalidate)
 {
 	struct smbd_mr *smbdirect_mr;
 	int rc, i;
@@ -2466,17 +2466,29 @@ struct smbd_mr *smbd_register_mr(
 	smbdirect_mr->sgl_count = num_pages;
 	sg_init_table(smbdirect_mr->sgl, num_pages);
 
-	for (i = 0; i < num_pages - 1; i++)
-		sg_set_page(&smbdirect_mr->sgl[i], pages[i], PAGE_SIZE, 0);
+	log_rdma_mr(INFO, "num_pages=0x%x offset=0x%x tailsz=0x%x\n", num_pages, offset, tailsz);
 
+	if (num_pages==1) {
+		sg_set_page(&smbdirect_mr->sgl[0], pages[0], tailsz, offset);
+		goto next;
+	}
+
+	/* We have at least two pages to register */
+	sg_set_page(&smbdirect_mr->sgl[0], pages[0], PAGE_SIZE - offset, offset);
+	i = 1;
+	while (i < num_pages - 1) {
+		sg_set_page(&smbdirect_mr->sgl[i], pages[i], PAGE_SIZE, 0);
+		i++;
+	}
 	sg_set_page(&smbdirect_mr->sgl[i], pages[i],
 		tailsz ? tailsz : PAGE_SIZE, 0);
 
+next:
 	dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
 	smbdirect_mr->dir = dir;
 	rc = ib_dma_map_sg(info->id->device, smbdirect_mr->sgl, num_pages, dir);
 	if (!rc) {
-		log_rdma_mr(INFO, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n",
+		log_rdma_mr(ERR, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n",
 			num_pages, dir, rc);
 		goto dma_map_error;
 	}
@@ -2484,8 +2496,8 @@ struct smbd_mr *smbd_register_mr(
 	rc = ib_map_mr_sg(smbdirect_mr->mr, smbdirect_mr->sgl, num_pages,
 		NULL, PAGE_SIZE);
 	if (rc != num_pages) {
-		log_rdma_mr(INFO,
-			"ib_map_mr_sg failed rc = %x num_pages = %x\n",
+		log_rdma_mr(ERR,
+			"ib_map_mr_sg failed rc = %d num_pages = %x\n",
 			rc, num_pages);
 		goto map_mr_error;
 	}
diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h
index f568577..bb1e0c4 100644
--- a/fs/cifs/smbdirect.h
+++ b/fs/cifs/smbdirect.h
@@ -314,7 +314,7 @@ struct smbd_mr {
 /* Interfaces to register and deregister MR for RDMA read/write */
 struct smbd_mr *smbd_register_mr(
 	struct smbd_connection *info, struct page *pages[], int num_pages,
-	int tailsz, bool writing, bool need_invalidate);
+	int offset, int tailsz, bool writing, bool need_invalidate);
 int smbd_deregister_mr(struct smbd_mr *mr);
 
 #else
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 08/09] Implement direct file I/O interfaces
  2018-05-18  0:22 [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA Long Li
                   ` (7 preceding siblings ...)
  2018-05-18  0:22 ` [RFC PATCH 07/09] Support page offset in memory regsitrations Long Li
@ 2018-05-18  0:22 ` Long Li
  2018-05-18  0:22 ` [RFC PATCH 09/09] Introduce cache=rdma moutning option Long Li
  9 siblings, 0 replies; 24+ messages in thread
From: Long Li @ 2018-05-18  0:22 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

Implement the main filesystem interface for doing read and write. These functions
don't copy the user data into a kenrel buffer for data transfer. Pages are directly
pinned and passed to the RDMA transport.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsfs.c |  19 ++++
 fs/cifs/cifsfs.h |   3 +
 fs/cifs/file.c   | 322 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 329 insertions(+), 15 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index f715609..ba19fed 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -1118,6 +1118,25 @@ const struct file_operations cifs_file_direct_ops = {
 	.fallocate = cifs_fallocate,
 };
 
+const struct file_operations cifs_file_direct_rdma_ops = {
+	.read_iter = cifs_direct_readv,
+	.write_iter = cifs_direct_writev,
+	.open = cifs_open,
+	.release = cifs_close,
+	.lock = cifs_lock,
+	.fsync = cifs_fsync,
+	.flush = cifs_flush,
+	.mmap = cifs_file_mmap,
+	.splice_read = generic_file_splice_read,
+	.splice_write = iter_file_splice_write,
+	.unlocked_ioctl  = cifs_ioctl,
+	.copy_file_range = cifs_copy_file_range,
+	.clone_file_range = cifs_clone_file_range,
+	.llseek = cifs_llseek,
+	.setlease = cifs_setlease,
+	.fallocate = cifs_fallocate,
+};
+
 const struct file_operations cifs_file_nobrl_ops = {
 	.read_iter = cifs_loose_read_iter,
 	.write_iter = cifs_file_write_iter,
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 013ba2a..223cca8 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -94,6 +94,7 @@ extern const struct inode_operations cifs_dfs_referral_inode_operations;
 /* Functions related to files and directories */
 extern const struct file_operations cifs_file_ops;
 extern const struct file_operations cifs_file_direct_ops; /* if directio mnt */
+extern const struct file_operations cifs_file_direct_rdma_ops; /* if directio mnt */
 extern const struct file_operations cifs_file_strict_ops; /* if strictio mnt */
 extern const struct file_operations cifs_file_nobrl_ops; /* no brlocks */
 extern const struct file_operations cifs_file_direct_nobrl_ops;
@@ -102,8 +103,10 @@ extern int cifs_open(struct inode *inode, struct file *file);
 extern int cifs_close(struct inode *inode, struct file *file);
 extern int cifs_closedir(struct inode *inode, struct file *file);
 extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
+extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
+extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from);
 extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
 extern int cifs_lock(struct file *, int, struct file_lock *);
 extern int cifs_fsync(struct file *, loff_t, loff_t, int);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index e240c7c..0b394db 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2452,15 +2452,46 @@ cifs_uncached_writedata_release(struct kref *refcount)
 	int i;
 	struct cifs_writedata *wdata = container_of(refcount,
 					struct cifs_writedata, refcount);
+	struct page **pages = wdata->direct_pages ? wdata->direct_pages : wdata->pages;
 
 	kref_put(&wdata->ctx->refcount, cifs_aio_ctx_release);
 	for (i = 0; i < wdata->nr_pages; i++)
-		put_page(wdata->pages[i]);
+		put_page(pages[i]);
 	cifs_writedata_release(refcount);
 }
 
 static void collect_uncached_write_data(struct cifs_aio_ctx *ctx);
 
+static void cifs_direct_writedata_release(struct kref *refcount)
+{
+	int i;
+	struct cifs_writedata *wdata = container_of(refcount,
+					struct cifs_writedata, refcount);
+
+	for (i = 0; i < wdata->nr_pages; i++)
+		put_page(wdata->direct_pages[i]);
+	kvfree(wdata->direct_pages);
+
+	cifs_writedata_release(refcount);
+}
+
+static void cifs_direct_writev_complete(struct work_struct *work)
+{
+	struct cifs_writedata *wdata = container_of(work,
+					struct cifs_writedata, work);
+	struct inode *inode = d_inode(wdata->cfile->dentry);
+	struct cifsInodeInfo *cifsi = CIFS_I(inode);
+
+	spin_lock(&inode->i_lock);
+	cifs_update_eof(cifsi, wdata->offset, wdata->bytes);
+	if (cifsi->server_eof > inode->i_size)
+		i_size_write(inode, cifsi->server_eof);
+	spin_unlock(&inode->i_lock);
+
+	complete(&wdata->done);
+	kref_put(&wdata->refcount, cifs_direct_writedata_release);
+}
+
 static void
 cifs_uncached_writev_complete(struct work_struct *work)
 {
@@ -2703,6 +2734,125 @@ static void collect_uncached_write_data(struct cifs_aio_ctx *ctx)
 		complete(&ctx->done);
 }
 
+ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	ssize_t total_written = 0;
+	struct cifsFileInfo *cfile;
+	struct cifs_tcon *tcon;
+	struct cifs_sb_info *cifs_sb;
+	struct TCP_Server_Info *server;
+	pid_t pid;
+	unsigned long nr_pages;
+	loff_t offset = iocb->ki_pos;
+	size_t len = iov_iter_count(from);
+	int rc;
+	struct cifs_writedata *wdata;
+
+	rc = generic_write_checks(iocb, from);
+	if (rc <= 0)
+		return rc;
+
+	cifs_sb = CIFS_FILE_SB(file);
+	cfile = file->private_data;
+	tcon = tlink_tcon(cfile->tlink);
+	server = tcon->ses->server;
+
+	if (!server->ops->async_writev)
+		return -ENOSYS;
+
+	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
+		pid = cfile->pid;
+	else
+		pid = current->tgid;
+
+	do {
+		unsigned int wsize, credits;
+		struct page **pagevec;
+		size_t start;
+		ssize_t cur_len;
+
+		rc = server->ops->wait_mtu_credits(server, cifs_sb->wsize,
+						   &wsize, &credits);
+		if (rc)
+			break;
+
+		cur_len = iov_iter_get_pages_alloc(from, &pagevec, wsize, &start);
+		if (cur_len < 0) {
+			cifs_dbg(VFS, "direct_writev couldn't get user pages (rc=%zd) iter type %d iov_offset %lu count %lu\n", cur_len, from->type, from->iov_offset, from->count);
+			dump_stack();
+			break;
+		}
+		if (cur_len < 0)
+			break;
+
+		nr_pages = (cur_len + start + PAGE_SIZE -1) / PAGE_SIZE;
+
+		wdata = cifs_writedata_alloc(nr_pages, pagevec,
+					     cifs_direct_writev_complete);
+		if (!wdata) {
+			rc = -ENOMEM;
+			add_credits_and_wake_if(server, credits, 0);
+			break;
+		}
+
+		wdata->nr_pages = nr_pages;
+		wdata->page_offset = start;
+		wdata->pagesz = PAGE_SIZE;
+		wdata->tailsz =
+			nr_pages > 1 ?
+			cur_len - (PAGE_SIZE-start) - (nr_pages - 2)*PAGE_SIZE :
+			cur_len;
+
+		wdata->sync_mode = WB_SYNC_ALL;
+		wdata->offset = (__u64)offset;
+		wdata->cfile = cifsFileInfo_get(cfile);
+		wdata->pid = pid;
+		wdata->bytes = cur_len;
+		wdata->credits = credits;
+
+		kref_get(&wdata->refcount);
+
+		if (!wdata->cfile->invalidHandle ||
+		    !(rc = cifs_reopen_file(wdata->cfile, false)))
+			rc = server->ops->async_writev(wdata,
+					cifs_direct_writedata_release);
+		if (rc) {
+			add_credits_and_wake_if(server, wdata->credits, 0);
+			kref_put(&wdata->refcount,
+				 cifs_writedata_release);
+			if (rc == -EAGAIN)
+				continue;
+			break;
+		} else
+			wait_for_completion(&wdata->done);
+
+		if (wdata->result) {
+			rc = wdata->result;
+			kref_put(&wdata->refcount, cifs_direct_writedata_release);
+			if (rc == -EAGAIN)
+				continue;
+			break;
+		}
+
+		kref_put(&wdata->refcount, cifs_direct_writedata_release);
+
+		iov_iter_advance(from, cur_len);
+		total_written += cur_len;
+		offset += cur_len;
+		len -= cur_len;
+	} while (len);
+
+	if (unlikely(!total_written)) {
+		printk(KERN_ERR "%s: total_written=%ld rc=%d\n", __func__, total_written, rc);
+		return rc;
+	}
+
+	iocb->ki_pos += total_written;
+	return total_written;
+
+}
+
 ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
@@ -2942,18 +3092,30 @@ cifs_read_allocate_pages(struct cifs_readdata *rdata, unsigned int nr_pages)
 	return rc;
 }
 
+static void cifs_direct_readdata_release(struct kref *refcount)
+{
+	struct cifs_readdata *rdata = container_of(refcount,
+					struct cifs_readdata, refcount);
+	unsigned int i;
+	for (i = 0; i < rdata->nr_pages; i++) {
+		put_page(rdata->direct_pages[i]);
+	}
+	kvfree(rdata->direct_pages);
+
+	cifs_readdata_release(refcount);
+}
+
 static void
 cifs_uncached_readdata_release(struct kref *refcount)
 {
 	struct cifs_readdata *rdata = container_of(refcount,
 					struct cifs_readdata, refcount);
 	unsigned int i;
+	struct page **pages = rdata->direct_pages ? rdata->direct_pages : rdata->pages;
 
 	kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release);
-	for (i = 0; i < rdata->nr_pages; i++) {
-		put_page(rdata->pages[i]);
-		rdata->pages[i] = NULL;
-	}
+	for (i = 0; i < rdata->nr_pages; i++)
+		put_page(pages[i]);
 	cifs_readdata_release(refcount);
 }
 
@@ -3013,30 +3175,32 @@ uncached_fill_pages(struct TCP_Server_Info *server,
 	int result = 0;
 	unsigned int i;
 	unsigned int nr_pages = rdata->nr_pages;
+	unsigned int page_offset = rdata->page_offset;
 
 	rdata->got_bytes = 0;
 	rdata->tailsz = PAGE_SIZE;
 	for (i = 0; i < nr_pages; i++) {
-		struct page *page = rdata->pages[i];
+		struct page *page = rdata->direct_pages ? rdata->direct_pages[i] : rdata->pages[i];
 		size_t n;
+		unsigned int segment_size = rdata->pagesz;
+
+		if (i == 0)
+			segment_size -= page_offset;
+		else
+			page_offset = 0;
+
 
 		if (len <= 0) {
 			/* no need to hold page hostage */
-			rdata->pages[i] = NULL;
 			rdata->nr_pages--;
 			put_page(page);
 			continue;
 		}
 		n = len;
-		if (len >= PAGE_SIZE) {
+		if (len >= segment_size)
 			/* enough data to fill the page */
-			n = PAGE_SIZE;
-			len -= n;
-		} else {
-			zero_user(page, len, PAGE_SIZE - len);
-			rdata->tailsz = len;
-			len = 0;
-		}
+			n = segment_size;
+		len -= n;
 		if (iter)
 			result = copy_page_from_iter(page, 0, n, iter);
 #ifdef CONFIG_CIFS_SMB_DIRECT
@@ -3243,6 +3407,134 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
 		complete(&ctx->done);
 }
 
+static void cifs_direct_readv_complete(struct work_struct *work)
+{
+	struct cifs_readdata *rdata = container_of(work, struct cifs_readdata, work);
+	int i = 0;
+	unsigned int bytes = 0;
+
+	// Set them dirty?
+	while (bytes < rdata->got_bytes + rdata->page_offset) {
+		set_page_dirty(rdata->direct_pages[i++]);
+		bytes += rdata->pagesz;
+	}
+	
+	complete(&rdata->done);
+	kref_put(&rdata->refcount, cifs_direct_readdata_release);
+}
+
+ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to)
+{
+	size_t len, cur_len, start;
+	unsigned int npages, rsize, credits;
+	struct file *file;
+	struct cifs_sb_info *cifs_sb;
+	struct cifsFileInfo *cfile;
+	struct cifs_tcon *tcon;
+	struct page **pagevec;
+	ssize_t rc, total_read = 0;
+	struct TCP_Server_Info *server;
+	loff_t offset = iocb->ki_pos;
+	pid_t pid;
+	struct cifs_readdata *rdata;
+	char *buf = to->iov->iov_base;
+
+	len = iov_iter_count(to);
+	if (!len)
+		return 0;
+
+	file = iocb->ki_filp;
+	cifs_sb = CIFS_FILE_SB(file);
+	cfile = file->private_data;
+	tcon = tlink_tcon(cfile->tlink);
+	server = tcon->ses->server;
+
+	if (!server->ops->async_readv)
+		return -ENOSYS;
+
+	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
+		pid = cfile->pid;
+	else
+		pid = current->tgid;
+
+	if ((file->f_flags & O_ACCMODE) == O_WRONLY)
+		cifs_dbg(FYI, "attempting read on write only file instance\n");
+
+	do {
+		rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
+					&rsize, &credits);
+		if (rc)
+			break;
+
+		cur_len = min_t(const size_t, len, rsize);
+
+		rc = iov_iter_get_pages_alloc(to, &pagevec, cur_len, &start);
+		if (rc < 0) {
+			cifs_dbg(VFS, "couldn't get user pages (rc=%zd) iter type %d iov_offset %lu count %lu\n", rc, to->type, to->iov_offset, to->count);
+			dump_stack();
+			break;
+		}
+
+		rdata = cifs_readdata_alloc(0, pagevec, cifs_direct_readv_complete);
+		if (!rdata) {
+			add_credits_and_wake_if(server, credits, 0);
+			rc = -ENOMEM;
+			break;
+		}
+
+		npages = (rc + start + PAGE_SIZE-1) / PAGE_SIZE;
+		rdata->nr_pages = npages;
+		rdata->page_offset = start;
+		rdata->pagesz = PAGE_SIZE;
+		rdata->tailsz = npages > 1 ?
+				rc-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
+				rc;
+		cur_len = rc;
+
+		rdata->cfile = cfile;
+		rdata->offset = offset;
+		rdata->bytes = rc;
+		rdata->pid = pid;
+		rdata->read_into_pages = cifs_uncached_read_into_pages;
+		rdata->copy_into_pages = cifs_uncached_copy_into_pages;
+		rdata->credits = credits;
+
+		kref_get(&rdata->refcount);
+
+		if (!rdata->cfile->invalidHandle ||
+		    !(rc = cifs_reopen_file(rdata->cfile, true)))
+			rc = server->ops->async_readv(rdata);
+
+		if (rc) {
+			add_credits_and_wake_if(server, rdata->credits, 0);
+			kref_put(&rdata->refcount,
+				 cifs_direct_readdata_release);
+			if (rc == -EAGAIN)
+				continue;
+		} else
+			wait_for_completion(&rdata->done);
+
+		rc = rdata->result;
+		if (rc) {
+			kref_put(&rdata->refcount, cifs_direct_readdata_release);
+			if (rc == -EAGAIN)
+				continue;
+			break;
+		}
+
+		total_read += rdata->got_bytes;
+		kref_put(&rdata->refcount, cifs_direct_readdata_release);
+
+		iov_iter_advance(to, cur_len);
+		len -= cur_len;
+		offset += cur_len;
+	} while (len);
+
+	iocb->ki_pos += total_read;
+
+	return total_read;
+}
+
 ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
 {
 	struct file *file = iocb->ki_filp;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 09/09] Introduce cache=rdma moutning option
  2018-05-18  0:22 [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA Long Li
                   ` (8 preceding siblings ...)
  2018-05-18  0:22 ` [RFC PATCH 08/09] Implement direct file I/O interfaces Long Li
@ 2018-05-18  0:22 ` Long Li
  2018-05-18  7:26   ` Christoph Hellwig
  9 siblings, 1 reply; 24+ messages in thread
From: Long Li @ 2018-05-18  0:22 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

When cache=rdma is enabled on mount options, CIFS do not allocate internal data
buffer pages for I/O, data is read/writen directly to user memory via RDMA.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifs_fs_sb.h | 2 ++
 fs/cifs/cifsglob.h   | 1 +
 fs/cifs/connect.c    | 9 +++++++++
 fs/cifs/dir.c        | 5 +++++
 fs/cifs/file.c       | 4 ++++
 fs/cifs/inode.c      | 4 +++-
 6 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/fs/cifs/cifs_fs_sb.h b/fs/cifs/cifs_fs_sb.h
index 350fa55..7c28dc3 100644
--- a/fs/cifs/cifs_fs_sb.h
+++ b/fs/cifs/cifs_fs_sb.h
@@ -51,6 +51,8 @@
 					      */
 #define CIFS_MOUNT_UID_FROM_ACL 0x2000000 /* try to get UID via special SID */
 
+#define CIFS_MOUNT_DIRECT_RDMA	0x4000000
+
 struct cifs_sb_info {
 	struct rb_root tlink_tree;
 	spinlock_t tlink_tree_lock;
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index a51855c..3bec63f 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -518,6 +518,7 @@ struct smb_vol {
 	bool server_ino:1; /* use inode numbers from server ie UniqueId */
 	bool direct_io:1;
 	bool strict_io:1; /* strict cache behavior */
+	bool rdma_io:1;
 	bool remap:1;      /* set to remap seven reserved chars in filenames */
 	bool sfu_remap:1;  /* remap seven reserved chars ala SFU */
 	bool posix_paths:1; /* unset to not ask for posix pathnames. */
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 46d0cf4..c92b3d8 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -282,6 +282,7 @@ enum {
 	Opt_cache_loose,
 	Opt_cache_strict,
 	Opt_cache_none,
+	Opt_cache_rdma,
 	Opt_cache_err
 };
 
@@ -289,6 +290,7 @@ static const match_table_t cifs_cacheflavor_tokens = {
 	{ Opt_cache_loose, "loose" },
 	{ Opt_cache_strict, "strict" },
 	{ Opt_cache_none, "none" },
+	{ Opt_cache_rdma, "rdma" },
 	{ Opt_cache_err, NULL }
 };
 
@@ -1128,6 +1130,9 @@ cifs_parse_cache_flavor(char *value, struct smb_vol *vol)
 		vol->direct_io = true;
 		vol->strict_io = false;
 		break;
+	case Opt_cache_rdma:
+		vol->rdma_io = true;
+		break;
 	default:
 		cifs_dbg(VFS, "bad cache= option: %s\n", value);
 		return 1;
@@ -3612,6 +3617,10 @@ int cifs_setup_cifs_sb(struct smb_vol *pvolume_info,
 		cifs_dbg(FYI, "mounting share using direct i/o\n");
 		cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_DIRECT_IO;
 	}
+	if (pvolume_info->rdma_io) {
+		cifs_dbg(VFS, "mounting share using rdma i/o\n");
+		cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_DIRECT_RDMA;
+	}
 	if (pvolume_info->mfsymlinks) {
 		if (pvolume_info->sfu_emul) {
 			/*
diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
index 81ba6e0..ce69b1c 100644
--- a/fs/cifs/dir.c
+++ b/fs/cifs/dir.c
@@ -557,6 +557,11 @@ cifs_atomic_open(struct inode *inode, struct dentry *direntry,
 			file->f_op = &cifs_file_direct_ops;
 		}
 
+	if (file->f_flags & O_DIRECT &&
+		CIFS_SB(inode->i_sb)->mnt_cifs_flags & CIFS_MOUNT_DIRECT_RDMA)
+			file->f_op = &cifs_file_direct_rdma_ops;
+
+
 	file_info = cifs_new_fileinfo(&fid, file, tlink, oplock);
 	if (file_info == NULL) {
 		if (server->ops->close)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0b394db..30ccf6a 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -492,6 +492,10 @@ int cifs_open(struct inode *inode, struct file *file)
 			file->f_op = &cifs_file_direct_ops;
 	}
 
+	if (file->f_flags & O_DIRECT &&
+            cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DIRECT_RDMA)
+		file->f_op = &cifs_file_direct_rdma_ops;
+
 	if (server->oplocks)
 		oplock = REQ_OPLOCK;
 	else
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 3c371f7..7991298 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -44,7 +44,9 @@ static void cifs_set_ops(struct inode *inode)
 	switch (inode->i_mode & S_IFMT) {
 	case S_IFREG:
 		inode->i_op = &cifs_file_inode_ops;
-		if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DIRECT_IO) {
+		if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DIRECT_RDMA)
+			inode->i_fop = &cifs_file_direct_rdma_ops;
+		else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DIRECT_IO) {
 			if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_BRL)
 				inode->i_fop = &cifs_file_direct_nobrl_ops;
 			else
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA
  2018-05-17 23:10 ` Tom Talpey
@ 2018-05-18  6:03   ` Long Li
  2018-05-18  6:44     ` Christoph Hellwig
  2018-05-19  0:58     ` Tom Talpey
  2018-05-18  6:42   ` Christoph Hellwig
  1 sibling, 2 replies; 24+ messages in thread
From: Long Li @ 2018-05-18  6:03 UTC (permalink / raw)
  To: Tom Talpey, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> Subject: Re: [RFC PATCH 00/09] Implement direct user I/O interfaces for
> RDMA
> 
> On 5/17/2018 8:22 PM, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > This patchset implements direct user I/O through RDMA.
> >
> > In normal code path (even with cache=none), CIFS copies I/O data from
> > user-space to kernel-space for security reasons.
> >
> > With this patchset, a new mounting option is introduced to have CIFS
> > pin the user-space buffer into memory and performs I/O through RDMA.
> > This avoids memory copy, at the cost of added security risk.
> 
> What's the security risk? This type of direct i/o behavior is not uncommon,
> and can certainly be made safe, using the appropriate memory registration
> and protection domains. Any risk needs to be stated explicitly, and mitigation
> provided, or at least described.

I think it's an assumption that user-mode buffer can't be trusted, so CIFS always copies them into internal buffers, and calculate signature and encryption based on protocol used.

With the direct buffer, the user can potentially modify the buffer when signature or encryption is in progress or after they are done.

I also want to point out that, I choose to implement .read_iter and .write_iter from file_operations to implement direct I/O (CIFS is already doing this for O_DIRECT, so following this code path will avoid a big mess up).  The ideal choice is to implement .direct_IO from address_space_operations that I think eventually we want to move to.

> 
> Tom.
> 
> >
> > This patchset is RFC. The work is in progress, do not merge.
> >
> >
> > Long Li (9):
> >    Introduce offset for the 1st page in data transfer structures
> >    Change wdata alloc to support direct pages
> >    Change rdata alloc to support direct pages
> >    Change function to support offset when reading pages
> >    Change RDMA send to regonize page offset in the 1st page
> >    Change RDMA recv to support offset in the 1st page
> >    Support page offset in memory regsitrations
> >    Implement no-copy file I/O interfaces
> >    Introduce cache=rdma moutning option
> >
> >
> >   fs/cifs/cifs_fs_sb.h      |   2 +
> >   fs/cifs/cifsfs.c          |  19 +++
> >   fs/cifs/cifsfs.h          |   3 +
> >   fs/cifs/cifsglob.h        |   6 +
> >   fs/cifs/cifsproto.h       |   4 +-
> >   fs/cifs/cifssmb.c         |  10 +-
> >   fs/cifs/connect.c         |  13 +-
> >   fs/cifs/dir.c             |   5 +
> >   fs/cifs/file.c            | 351
> ++++++++++++++++++++++++++++++++++++++++++----
> >   fs/cifs/inode.c           |   4 +-
> >   fs/cifs/smb2ops.c         |   2 +-
> >   fs/cifs/smb2pdu.c         |  22 ++-
> >   fs/cifs/smbdirect.c       | 132 ++++++++++-------
> >   fs/cifs/smbdirect.h       |   2 +-
> >   fs/read_write.c           |   7 +
> >   include/linux/ratelimit.h |   2 +-
> >   16 files changed, 489 insertions(+), 95 deletions(-)
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 01/09] Introduce offset for the 1st page in data transfer structures
  2018-05-18  0:22 ` [RFC PATCH 01/09] Introduce offset for the 1st page in data transfer structures Long Li
@ 2018-05-18  6:37   ` Steve French
  0 siblings, 0 replies; 24+ messages in thread
From: Steve French @ 2018-05-18  6:37 UTC (permalink / raw)
  To: Long Li; +Cc: Steve French, CIFS, samba-technical, LKML, linux-rdma

merged into cifs-2.6.git for-next

On Thu, May 17, 2018 at 7:22 PM, Long Li <longli@linuxonhyperv.com> wrote:
> From: Long Li <longli@microsoft.com>
>
> Currently CIFS allocates its own pages for data transfer, they don't need offset
> since it's always 0 in the 1st page.
>
> Direct data transfer needs to define an offset because user-data may not start
> on the page boundary
>
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>  fs/cifs/cifsglob.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index cb950a5..a51855c 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -176,6 +176,7 @@ struct smb_rqst {
>         struct kvec     *rq_iov;        /* array of kvecs */
>         unsigned int    rq_nvec;        /* number of kvecs in array */
>         struct page     **rq_pages;     /* pointer to array of page ptrs */
> +       unsigned int    rq_offset;      /* the offset to the 1st page */
>         unsigned int    rq_npages;      /* number pages in array */
>         unsigned int    rq_pagesz;      /* page size to use */
>         unsigned int    rq_tailsz;      /* length of last page */
> @@ -1167,8 +1168,10 @@ struct cifs_readdata {
>         struct kvec                     iov[2];
>  #ifdef CONFIG_CIFS_SMB_DIRECT
>         struct smbd_mr                  *mr;
> +       struct page                     **direct_pages;
>  #endif
>         unsigned int                    pagesz;
> +       unsigned int                    page_offset;
>         unsigned int                    tailsz;
>         unsigned int                    credits;
>         unsigned int                    nr_pages;
> @@ -1192,8 +1195,10 @@ struct cifs_writedata {
>         int                             result;
>  #ifdef CONFIG_CIFS_SMB_DIRECT
>         struct smbd_mr                  *mr;
> +       struct page                     **direct_pages;
>  #endif
>         unsigned int                    pagesz;
> +       unsigned int                    page_offset;
>         unsigned int                    tailsz;
>         unsigned int                    credits;
>         unsigned int                    nr_pages;
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA
  2018-05-17 23:10 ` Tom Talpey
  2018-05-18  6:03   ` Long Li
@ 2018-05-18  6:42   ` Christoph Hellwig
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2018-05-18  6:42 UTC (permalink / raw)
  To: Tom Talpey
  Cc: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On Thu, May 17, 2018 at 07:10:04PM -0400, Tom Talpey wrote:
> What's the security risk? This type of direct i/o behavior is not
> uncommon, and can certainly be made safe, using the appropriate
> memory registration and protection domains. Any risk needs to be
> stated explicitly, and mitigation provided, or at least described.

And in fact it is the same behavior you'll see on NFS over RDMA, or
a block device or any local fs over SRP/iSER/NVMe over Fabrics..

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA
  2018-05-18  6:03   ` Long Li
@ 2018-05-18  6:44     ` Christoph Hellwig
  2018-05-19  0:58     ` Tom Talpey
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2018-05-18  6:44 UTC (permalink / raw)
  To: Long Li
  Cc: Tom Talpey, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

On Fri, May 18, 2018 at 06:03:09AM +0000, Long Li wrote:
> I also want to point out that, I choose to implement .read_iter and .write_iter from file_operations to implement direct I/O (CIFS is already doing this for O_DIRECT, so following this code path will avoid a big mess up).  The ideal choice is to implement .direct_IO from address_space_operations that I think eventually we want to move to.

No, the direct_IO address space operation is the mess.  We're moving
away from it.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
  2018-05-18  0:22 ` [RFC PATCH 09/09] Introduce cache=rdma moutning option Long Li
@ 2018-05-18  7:26   ` Christoph Hellwig
  2018-05-18 19:00     ` Long Li
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2018-05-18  7:26 UTC (permalink / raw)
  To: longli
  Cc: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma

On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> When cache=rdma is enabled on mount options, CIFS do not allocate internal data
> buffer pages for I/O, data is read/writen directly to user memory via RDMA.

I don't think this should be an option.  For direct I/O without signing
or encryption CIFS should always use get_user_pages, with or without
RDMA.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC PATCH 09/09] Introduce cache=rdma moutning option
  2018-05-18  7:26   ` Christoph Hellwig
@ 2018-05-18 19:00     ` Long Li
  2018-05-18 20:44       ` Steve French
  0 siblings, 1 reply; 24+ messages in thread
From: Long Li @ 2018-05-18 19:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma

> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
> 
> On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > When cache=rdma is enabled on mount options, CIFS do not allocate
> > internal data buffer pages for I/O, data is read/writen directly to user
> memory via RDMA.
> 
> I don't think this should be an option.  For direct I/O without signing or
> encryption CIFS should always use get_user_pages, with or without RDMA.

Yes this should be done for all transport. If there are no objections, I'll send patches to change this.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
  2018-05-18 19:00     ` Long Li
@ 2018-05-18 20:44       ` Steve French
  2018-05-18 20:58         ` Long Li
  0 siblings, 1 reply; 24+ messages in thread
From: Steve French @ 2018-05-18 20:44 UTC (permalink / raw)
  To: Long Li
  Cc: Christoph Hellwig, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

On Fri, May 18, 2018 at 12:00 PM, Long Li via samba-technical
<samba-technical@lists.samba.org> wrote:
>> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
>>
>> On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote:
>> > From: Long Li <longli@microsoft.com>
>> >
>> > When cache=rdma is enabled on mount options, CIFS do not allocate
>> > internal data buffer pages for I/O, data is read/writen directly to user
>> memory via RDMA.
>>
>> I don't think this should be an option.  For direct I/O without signing or
>> encryption CIFS should always use get_user_pages, with or without RDMA.
>
> Yes this should be done for all transport. If there are no objections, I'll send patches to change this.

Would this help/change performance much?



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC PATCH 09/09] Introduce cache=rdma moutning option
  2018-05-18 20:44       ` Steve French
@ 2018-05-18 20:58         ` Long Li
  2018-05-19  1:20           ` Tom Talpey
  0 siblings, 1 reply; 24+ messages in thread
From: Long Li @ 2018-05-18 20:58 UTC (permalink / raw)
  To: Steve French
  Cc: Christoph Hellwig, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
> 
> On Fri, May 18, 2018 at 12:00 PM, Long Li via samba-technical <samba-
> technical@lists.samba.org> wrote:
> >> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
> >>
> >> On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote:
> >> > From: Long Li <longli@microsoft.com>
> >> >
> >> > When cache=rdma is enabled on mount options, CIFS do not allocate
> >> > internal data buffer pages for I/O, data is read/writen directly to
> >> > user
> >> memory via RDMA.
> >>
> >> I don't think this should be an option.  For direct I/O without
> >> signing or encryption CIFS should always use get_user_pages, with or
> without RDMA.
> >
> > Yes this should be done for all transport. If there are no objections, I'll send
> patches to change this.
> 
> Would this help/change performance much?

On RDMA, it helps with I/O latency and reduces CPU usage on certain I/O patterns.

But I haven't tested on TCP. Maybe it will help a little bit.

> 
> 
> 
> --
> Thanks,
> 
> Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA
  2018-05-18  6:03   ` Long Li
  2018-05-18  6:44     ` Christoph Hellwig
@ 2018-05-19  0:58     ` Tom Talpey
  1 sibling, 0 replies; 24+ messages in thread
From: Tom Talpey @ 2018-05-19  0:58 UTC (permalink / raw)
  To: Long Li, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 5/17/2018 11:03 PM, Long Li wrote:
>> Subject: Re: [RFC PATCH 00/09] Implement direct user I/O interfaces for
>> RDMA
>>
>> On 5/17/2018 8:22 PM, Long Li wrote:
>>> From: Long Li <longli@microsoft.com>
>>>
>>> This patchset implements direct user I/O through RDMA.
>>>
>>> In normal code path (even with cache=none), CIFS copies I/O data from
>>> user-space to kernel-space for security reasons.
>>>
>>> With this patchset, a new mounting option is introduced to have CIFS
>>> pin the user-space buffer into memory and performs I/O through RDMA.
>>> This avoids memory copy, at the cost of added security risk.
>>
>> What's the security risk? This type of direct i/o behavior is not uncommon,
>> and can certainly be made safe, using the appropriate memory registration
>> and protection domains. Any risk needs to be stated explicitly, and mitigation
>> provided, or at least described.
> 
> I think it's an assumption that user-mode buffer can't be trusted, so CIFS always copies them into internal buffers, and calculate signature and encryption based on protocol used.
> 
> With the direct buffer, the user can potentially modify the buffer when signature or encryption is in progress or after they are done.

I don't agree that the legacy copying behavior is because the buffer is
"untrusted". The buffer is the user's data, there's no trust issue here.
If the user application modifies the buffer while it's being sent, it's
a violation of the API contract, and the only victim is the application
itself. Same applies for receiving data. And as pointed out, most all
storage layers, file and block both, use this strategy for direct i/o.

Regarding signing, if the application alters the data then the integrity
hash will simply do its job and catch the application in the act. Again,
nothing suffers but the application.

Regarding encryption, I assume you're proposing to encrypt and decrypt
the data in a kernel buffer, effectively a copy. So in fact, in the
encryption case there's no need to pin and map the user buffer at all.

I'll mention however that Windows takes the path of not performing
RDMA placement when encrypting data. It saves nothing, and even adds
some overhead, because of the need to touch the buffer anyway to
manage the encryption/decryption.

Bottom line - no security implication for using user buffers directly.

Tom.


> I also want to point out that, I choose to implement .read_iter and .write_iter from file_operations to implement direct I/O (CIFS is already doing this for O_DIRECT, so following this code path will avoid a big mess up).  The ideal choice is to implement .direct_IO from address_space_operations that I think eventually we want to move to.
> 
>>
>> Tom.
>>
>>>
>>> This patchset is RFC. The work is in progress, do not merge.
>>>
>>>
>>> Long Li (9):
>>>     Introduce offset for the 1st page in data transfer structures
>>>     Change wdata alloc to support direct pages
>>>     Change rdata alloc to support direct pages
>>>     Change function to support offset when reading pages
>>>     Change RDMA send to regonize page offset in the 1st page
>>>     Change RDMA recv to support offset in the 1st page
>>>     Support page offset in memory regsitrations
>>>     Implement no-copy file I/O interfaces
>>>     Introduce cache=rdma moutning option
>>>
>>>
>>>    fs/cifs/cifs_fs_sb.h      |   2 +
>>>    fs/cifs/cifsfs.c          |  19 +++
>>>    fs/cifs/cifsfs.h          |   3 +
>>>    fs/cifs/cifsglob.h        |   6 +
>>>    fs/cifs/cifsproto.h       |   4 +-
>>>    fs/cifs/cifssmb.c         |  10 +-
>>>    fs/cifs/connect.c         |  13 +-
>>>    fs/cifs/dir.c             |   5 +
>>>    fs/cifs/file.c            | 351
>> ++++++++++++++++++++++++++++++++++++++++++----
>>>    fs/cifs/inode.c           |   4 +-
>>>    fs/cifs/smb2ops.c         |   2 +-
>>>    fs/cifs/smb2pdu.c         |  22 ++-
>>>    fs/cifs/smbdirect.c       | 132 ++++++++++-------
>>>    fs/cifs/smbdirect.h       |   2 +-
>>>    fs/read_write.c           |   7 +
>>>    include/linux/ratelimit.h |   2 +-
>>>    16 files changed, 489 insertions(+), 95 deletions(-)
>>>
> N�����r��y���b�X��ǧv�^�)޺{.n�+����{��ٚ�{ay�\x1dʇڙ�,j\a��f���h���z�\x1e�w���\f���j:+v���w�j�m����\a����zZ+�����ݢj"��!tml=
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 02/09] Change wdata alloc to support direct pages
  2018-05-18  0:22 ` [RFC PATCH 02/09] Change wdata alloc to support direct pages Long Li
@ 2018-05-19  1:05   ` Tom Talpey
  0 siblings, 0 replies; 24+ messages in thread
From: Tom Talpey @ 2018-05-19  1:05 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 5/17/2018 5:22 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> When using direct pages from user space, there is no need to allocate pages.
> 
> Just ping those user pages for RDMA.

Did you mean "pin" those user pages? If so, where does that pinning
occur, it's not in this patch.

Perhaps this should just say "point to" those user pages.

I also don't think this is necessarily only "for RDMA". Perhaps
there are other transport scenarios where this is advantageous.


> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>   fs/cifs/cifsproto.h |  2 +-
>   fs/cifs/cifssmb.c   | 10 +++++++---
>   fs/cifs/file.c      |  4 ++--
>   3 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
> index 365a414..94106b9 100644
> --- a/fs/cifs/cifsproto.h
> +++ b/fs/cifs/cifsproto.h
> @@ -523,7 +523,7 @@ int cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid);
>   int cifs_async_writev(struct cifs_writedata *wdata,
>   		      void (*release)(struct kref *kref));
>   void cifs_writev_complete(struct work_struct *work);
> -struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages,
> +struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages, struct page **direct_pages,
>   						work_func_t complete);
>   void cifs_writedata_release(struct kref *refcount);
>   int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> index 1529a08..3b1731d 100644
> --- a/fs/cifs/cifssmb.c
> +++ b/fs/cifs/cifssmb.c
> @@ -1983,7 +1983,7 @@ cifs_writev_requeue(struct cifs_writedata *wdata)
>   			tailsz = rest_len - (nr_pages - 1) * PAGE_SIZE;
>   		}
>   
> -		wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete);
> +		wdata2 = cifs_writedata_alloc(nr_pages, NULL, cifs_writev_complete);
>   		if (!wdata2) {
>   			rc = -ENOMEM;
>   			break;
> @@ -2067,12 +2067,16 @@ cifs_writev_complete(struct work_struct *work)
>   }
>   
>   struct cifs_writedata *
> -cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
> +cifs_writedata_alloc(unsigned int nr_pages, struct page **direct_pages, work_func_t complete)
>   {
>   	struct cifs_writedata *wdata;
>   
>   	/* writedata + number of page pointers */
> -	wdata = kzalloc(sizeof(*wdata) +
> +	if (direct_pages) {
> +		wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
> +		wdata->direct_pages = direct_pages;
> +	} else
> +		wdata = kzalloc(sizeof(*wdata) +
>   			sizeof(struct page *) * nr_pages, GFP_NOFS);
>   	if (wdata != NULL) {
>   		kref_init(&wdata->refcount);
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 23fd430..a6ec896 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -1965,7 +1965,7 @@ wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping,
>   {
>   	struct cifs_writedata *wdata;
>   
> -	wdata = cifs_writedata_alloc((unsigned int)tofind,
> +	wdata = cifs_writedata_alloc((unsigned int)tofind, NULL,
>   				     cifs_writev_complete);
>   	if (!wdata)
>   		return NULL;
> @@ -2554,7 +2554,7 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
>   			break;
>   
>   		nr_pages = get_numpages(wsize, len, &cur_len);
> -		wdata = cifs_writedata_alloc(nr_pages,
> +		wdata = cifs_writedata_alloc(nr_pages, NULL,
>   					     cifs_uncached_writev_complete);
>   		if (!wdata) {
>   			rc = -ENOMEM;
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 05/09] Change RDMA send to regonize page offset in the 1st page
  2018-05-18  0:22 ` [RFC PATCH 05/09] Change RDMA send to regonize page offset in the 1st page Long Li
@ 2018-05-19  1:09   ` Tom Talpey
  2018-05-19  5:54     ` Long Li
  0 siblings, 1 reply; 24+ messages in thread
From: Tom Talpey @ 2018-05-19  1:09 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 5/17/2018 5:22 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>

There's a typo "recognize" in the patch title


> When doing RDMA send, the offset needs to be checked as data may start in an offset
> in the 1st page.

Doesn't this patch alter the generic smb2pdu.c code too? I think this
should note "any" send, not just RDMA?

Tom.

> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>   fs/cifs/smb2pdu.c   |  3 ++-
>   fs/cifs/smbdirect.c | 25 +++++++++++++++++++------
>   2 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
> index 5097f28..fdcf97e 100644
> --- a/fs/cifs/smb2pdu.c
> +++ b/fs/cifs/smb2pdu.c
> @@ -3015,7 +3015,8 @@ smb2_async_writev(struct cifs_writedata *wdata,
>   
>   	rqst.rq_iov = iov;
>   	rqst.rq_nvec = 2;
> -	rqst.rq_pages = wdata->pages;
> +	rqst.rq_pages = wdata->direct_pages ? wdata->direct_pages : wdata->pages;
> +	rqst.rq_offset = wdata->page_offset;
>   	rqst.rq_npages = wdata->nr_pages;
>   	rqst.rq_pagesz = wdata->pagesz;
>   	rqst.rq_tailsz = wdata->tailsz;
> diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
> index b0a1955..b46586d 100644
> --- a/fs/cifs/smbdirect.c
> +++ b/fs/cifs/smbdirect.c
> @@ -2084,8 +2084,10 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
>   
>   	/* add in the page array if there is one */
>   	if (rqst->rq_npages) {
> -		buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
> -		buflen += rqst->rq_tailsz;
> +		if (rqst->rq_npages == 1)
> +			buflen += rqst->rq_tailsz;
> +		else
> +			buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) - rqst->rq_offset + rqst->rq_tailsz;
>   	}
>   
>   	if (buflen + sizeof(struct smbd_data_transfer) >
> @@ -2182,8 +2184,19 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
>   
>   	/* now sending pages if there are any */
>   	for (i = 0; i < rqst->rq_npages; i++) {
> -		buflen = (i == rqst->rq_npages-1) ?
> -			rqst->rq_tailsz : rqst->rq_pagesz;
> +		unsigned int offset = 0;
> +		if (i == 0)
> +			offset = rqst->rq_offset;
> +		if (rqst->rq_npages == 1 || i == rqst->rq_npages-1)
> +			buflen = rqst->rq_tailsz;
> +		else {
> +			/* We have at least two pages, and this is not the last page */
> +			if (i == 0)
> +				buflen = rqst->rq_pagesz - rqst->rq_offset;
> +			else
> +				buflen = rqst->rq_pagesz;
> +		}
> +
>   		nvecs = (buflen + max_iov_size - 1) / max_iov_size;
>   		log_write(INFO, "sending pages buflen=%d nvecs=%d\n",
>   			buflen, nvecs);
> @@ -2194,9 +2207,9 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
>   			remaining_data_length -= size;
>   			log_write(INFO, "sending pages i=%d offset=%d size=%d"
>   				" remaining_data_length=%d\n",
> -				i, j*max_iov_size, size, remaining_data_length);
> +				i, j*max_iov_size+offset, size, remaining_data_length);
>   			rc = smbd_post_send_page(
> -				info, rqst->rq_pages[i], j*max_iov_size,
> +				info, rqst->rq_pages[i], j*max_iov_size + offset,
>   				size, remaining_data_length);
>   			if (rc)
>   				goto done;
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
  2018-05-18 20:58         ` Long Li
@ 2018-05-19  1:20           ` Tom Talpey
  0 siblings, 0 replies; 24+ messages in thread
From: Tom Talpey @ 2018-05-19  1:20 UTC (permalink / raw)
  To: Long Li, Steve French
  Cc: Christoph Hellwig, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

On 5/18/2018 1:58 PM, Long Li wrote:
>> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
>>
>> On Fri, May 18, 2018 at 12:00 PM, Long Li via samba-technical <samba-
>> technical@lists.samba.org> wrote:
>>>> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
>>>>
>>>> On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote:
>>>>> From: Long Li <longli@microsoft.com>
>>>>>
>>>>> When cache=rdma is enabled on mount options, CIFS do not allocate
>>>>> internal data buffer pages for I/O, data is read/writen directly to
>>>>> user
>>>> memory via RDMA.
>>>>
>>>> I don't think this should be an option.  For direct I/O without
>>>> signing or encryption CIFS should always use get_user_pages, with or
>> without RDMA.
>>>
>>> Yes this should be done for all transport. If there are no objections, I'll send
>> patches to change this.
>>
>> Would this help/change performance much?
> 
> On RDMA, it helps with I/O latency and reduces CPU usage on certain I/O patterns.
> 
> But I haven't tested on TCP. Maybe it will help a little bit.

Well, when the application requests direct i/o on a TCP connection,
you definitely don't want to cache it! So even if the performance
is different, correctness would dictate doing this.

You probably don't need to pin the buffer in the TCP case, which
might be worth avoiding.

Tom.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC PATCH 05/09] Change RDMA send to regonize page offset in the 1st page
  2018-05-19  1:09   ` Tom Talpey
@ 2018-05-19  5:54     ` Long Li
  0 siblings, 0 replies; 24+ messages in thread
From: Long Li @ 2018-05-19  5:54 UTC (permalink / raw)
  To: Tom Talpey, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> Subject: Re: [RFC PATCH 05/09] Change RDMA send to regonize page offset
> in the 1st page
> 
> On 5/17/2018 5:22 PM, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> 
> There's a typo "recognize" in the patch title
> 
> 
> > When doing RDMA send, the offset needs to be checked as data may start
> > in an offset in the 1st page.
> 
> Doesn't this patch alter the generic smb2pdu.c code too? I think this should
> note "any" send, not just RDMA?

Yes, but for TCP the direct_pages and page_offset are always NULL and 0 when cache=rdma is in mounting options, so it doesn't really affect he TCP code path in this patch set.

This behavior can be changed, and it will work with TCP too.

> 
> Tom.
> 
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >   fs/cifs/smb2pdu.c   |  3 ++-
> >   fs/cifs/smbdirect.c | 25 +++++++++++++++++++------
> >   2 files changed, 21 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index
> > 5097f28..fdcf97e 100644
> > --- a/fs/cifs/smb2pdu.c
> > +++ b/fs/cifs/smb2pdu.c
> > @@ -3015,7 +3015,8 @@ smb2_async_writev(struct cifs_writedata *wdata,
> >
> >   	rqst.rq_iov = iov;
> >   	rqst.rq_nvec = 2;
> > -	rqst.rq_pages = wdata->pages;
> > +	rqst.rq_pages = wdata->direct_pages ? wdata->direct_pages :
> wdata->pages;
> > +	rqst.rq_offset = wdata->page_offset;
> >   	rqst.rq_npages = wdata->nr_pages;
> >   	rqst.rq_pagesz = wdata->pagesz;
> >   	rqst.rq_tailsz = wdata->tailsz;
> > diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index
> > b0a1955..b46586d 100644
> > --- a/fs/cifs/smbdirect.c
> > +++ b/fs/cifs/smbdirect.c
> > @@ -2084,8 +2084,10 @@ int smbd_send(struct smbd_connection *info,
> > struct smb_rqst *rqst)
> >
> >   	/* add in the page array if there is one */
> >   	if (rqst->rq_npages) {
> > -		buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
> > -		buflen += rqst->rq_tailsz;
> > +		if (rqst->rq_npages == 1)
> > +			buflen += rqst->rq_tailsz;
> > +		else
> > +			buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
> > +rqst->rq_offset + rqst->rq_tailsz;
> >   	}
> >
> >   	if (buflen + sizeof(struct smbd_data_transfer) > @@ -2182,8
> > +2184,19 @@ int smbd_send(struct smbd_connection *info, struct
> > smb_rqst *rqst)
> >
> >   	/* now sending pages if there are any */
> >   	for (i = 0; i < rqst->rq_npages; i++) {
> > -		buflen = (i == rqst->rq_npages-1) ?
> > -			rqst->rq_tailsz : rqst->rq_pagesz;
> > +		unsigned int offset = 0;
> > +		if (i == 0)
> > +			offset = rqst->rq_offset;
> > +		if (rqst->rq_npages == 1 || i == rqst->rq_npages-1)
> > +			buflen = rqst->rq_tailsz;
> > +		else {
> > +			/* We have at least two pages, and this is not the last
> page */
> > +			if (i == 0)
> > +				buflen = rqst->rq_pagesz - rqst->rq_offset;
> > +			else
> > +				buflen = rqst->rq_pagesz;
> > +		}
> > +
> >   		nvecs = (buflen + max_iov_size - 1) / max_iov_size;
> >   		log_write(INFO, "sending pages buflen=%d nvecs=%d\n",
> >   			buflen, nvecs);
> > @@ -2194,9 +2207,9 @@ int smbd_send(struct smbd_connection *info,
> struct smb_rqst *rqst)
> >   			remaining_data_length -= size;
> >   			log_write(INFO, "sending pages i=%d offset=%d
> size=%d"
> >   				" remaining_data_length=%d\n",
> > -				i, j*max_iov_size, size,
> remaining_data_length);
> > +				i, j*max_iov_size+offset, size,
> remaining_data_length);
> >   			rc = smbd_post_send_page(
> > -				info, rqst->rq_pages[i], j*max_iov_size,
> > +				info, rqst->rq_pages[i], j*max_iov_size +
> offset,
> >   				size, remaining_data_length);
> >   			if (rc)
> >   				goto done;
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-05-19  5:54 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-18  0:22 [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA Long Li
2018-05-17 23:10 ` Tom Talpey
2018-05-18  6:03   ` Long Li
2018-05-18  6:44     ` Christoph Hellwig
2018-05-19  0:58     ` Tom Talpey
2018-05-18  6:42   ` Christoph Hellwig
2018-05-18  0:22 ` [RFC PATCH 01/09] Introduce offset for the 1st page in data transfer structures Long Li
2018-05-18  6:37   ` Steve French
2018-05-18  0:22 ` [RFC PATCH 02/09] Change wdata alloc to support direct pages Long Li
2018-05-19  1:05   ` Tom Talpey
2018-05-18  0:22 ` [RFC PATCH 03/09] Change rdata " Long Li
2018-05-18  0:22 ` [RFC PATCH 04/09] Change function to support offset when reading pages Long Li
2018-05-18  0:22 ` [RFC PATCH 05/09] Change RDMA send to regonize page offset in the 1st page Long Li
2018-05-19  1:09   ` Tom Talpey
2018-05-19  5:54     ` Long Li
2018-05-18  0:22 ` [RFC PATCH 06/09] Change RDMA recv to support " Long Li
2018-05-18  0:22 ` [RFC PATCH 07/09] Support page offset in memory regsitrations Long Li
2018-05-18  0:22 ` [RFC PATCH 08/09] Implement direct file I/O interfaces Long Li
2018-05-18  0:22 ` [RFC PATCH 09/09] Introduce cache=rdma moutning option Long Li
2018-05-18  7:26   ` Christoph Hellwig
2018-05-18 19:00     ` Long Li
2018-05-18 20:44       ` Steve French
2018-05-18 20:58         ` Long Li
2018-05-19  1:20           ` Tom Talpey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).