LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Nath, Arindam" <Arindam.Nath@amd.com>
To: Logan Gunthorpe <logang@deltatee.com>,
	"Mehta, Sanju" <Sanju.Mehta@amd.com>,
	"jdmason@kudzu.us" <jdmason@kudzu.us>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"allenbh@gmail.com" <allenbh@gmail.com>,
	"S-k, Shyam-sundar" <Shyam-sundar.S-k@amd.com>
Cc: "linux-ntb@googlegroups.com" <linux-ntb@googlegroups.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH v2 1/5] ntb_perf: refactor code for CPU and DMA transfers
Date: Wed, 11 Mar 2020 17:44:37 +0000	[thread overview]
Message-ID: <MN2PR12MB3232C36B88F3667D89FFEA929CFC0@MN2PR12MB3232.namprd12.prod.outlook.com> (raw)
In-Reply-To: <e700a5f6-1929-0d65-b204-c5bfde58f5f7@deltatee.com>

> -----Original Message-----
> From: Logan Gunthorpe <logang@deltatee.com>
> Sent: Wednesday, March 11, 2020 02:51
> To: Mehta, Sanju <Sanju.Mehta@amd.com>; jdmason@kudzu.us;
> dave.jiang@intel.com; allenbh@gmail.com; Nath, Arindam
> <Arindam.Nath@amd.com>; S-k, Shyam-sundar <Shyam-sundar.S-
> k@amd.com>
> Cc: linux-ntb@googlegroups.com; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 1/5] ntb_perf: refactor code for CPU and DMA
> transfers
> 
> 
> 
> On 2020-03-10 2:54 p.m., Sanjay R Mehta wrote:
> > From: Arindam Nath <arindam.nath@amd.com>
> >
> > This patch creates separate function to handle CPU
> > and DMA transfers. Since CPU transfers use memcopy
> > and DMA transfers use dmaengine APIs, these changes
> > not only allow logical separation between the two,
> > but also allows someone to clearly see the difference
> > in the way the two are handled.
> >
> > In the case of DMA, we DMA from system memory to the
> > memory window(MW) of NTB, which is a MMIO region, we
> > should not use dma_map_page() for mapping MW. The
> > correct way to map a MMIO region is to use
> > dma_map_resource(), so the code is modified
> > accordingly.
> >
> > dma_map_resource() expects physical address of the
> > region to be mapped for DMA, we add a new field,
> > outbuf_phys_addr, to struct perf_peer, and also
> > another field, outbuf_dma_addr, to store the
> > corresponding mapped address returned by the API.
> >
> > Since the MW is contiguous, rather than mapping
> > chunk-by-chunk, we map the entire MW before the
> > actual DMA transfer happens. Then for each chunk,
> > we simply pass offset into the mapped region and
> > DMA to that region. Then later, we unmap the MW
> > during perf_clear_test().
> >
> > The above means that now we need to have different
> > function parameters to deal with in the case of
> > CPU and DMA transfers. In the case of CPU transfers,
> > we simply need the CPU virtual addresses for memcopy,
> > but in the case of DMA, we need dma_addr_t, which
> > will be different from CPU physical address depending
> > on whether IOMMU is enabled or not. Thus we now
> > have two separate functions, perf_copy_chunk_cpu(),
> > and perf_copy_chunk_dma() to take care of above
> > consideration.
> >
> > Signed-off-by: Arindam Nath <arindam.nath@amd.com>
> > Signed-off-by: Sanjay R Mehta <sanju.mehta@amd.com>
> > ---
> >  drivers/ntb/test/ntb_perf.c | 141
> +++++++++++++++++++++++++++++++++-----------
> >  1 file changed, 105 insertions(+), 36 deletions(-)
> >
> > diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
> > index e9b7c2d..6d16628 100644
> > --- a/drivers/ntb/test/ntb_perf.c
> > +++ b/drivers/ntb/test/ntb_perf.c
> > @@ -149,6 +149,8 @@ struct perf_peer {
> >  	u64 outbuf_xlat;
> >  	resource_size_t outbuf_size;
> >  	void __iomem *outbuf;
> > +	phys_addr_t outbuf_phys_addr;
> > +	dma_addr_t outbuf_dma_addr;
> >
> >  	/* Inbound MW params */
> >  	dma_addr_t inbuf_xlat;
> > @@ -775,26 +777,24 @@ static void perf_dma_copy_callback(void *data)
> >  	wake_up(&pthr->dma_wait);
> >  }
> >
> > -static int perf_copy_chunk(struct perf_thread *pthr,
> > -			   void __iomem *dst, void *src, size_t len)
> > +static int perf_copy_chunk_cpu(struct perf_thread *pthr,
> > +			       void __iomem *dst, void *src, size_t len)
> > +{
> > +	memcpy_toio(dst, src, len);
> > +
> > +	return likely(atomic_read(&pthr->perf->tsync) > 0) ? 0 : -EINTR;
> > +}
> > +
> > +static int perf_copy_chunk_dma(struct perf_thread *pthr,
> > +			       dma_addr_t dst, void *src, size_t len)
> >  {
> >  	struct dma_async_tx_descriptor *tx;
> >  	struct dmaengine_unmap_data *unmap;
> >  	struct device *dma_dev;
> >  	int try = 0, ret = 0;
> >
> > -	if (!use_dma) {
> > -		memcpy_toio(dst, src, len);
> > -		goto ret_check_tsync;
> > -	}
> > -
> >  	dma_dev = pthr->dma_chan->device->dev;
> > -
> > -	if (!is_dma_copy_aligned(pthr->dma_chan->device,
> offset_in_page(src),
> > -				 offset_in_page(dst), len))
> > -		return -EIO;
> 
> Can you please split this patch into multiple patches? It is hard to
> review and part of the reason this code is such a mess is because we
> merged large patches with a bunch of different changes rolled into one,
> many of which didn't get sufficient reviewer attention.
> 
> Patches that refactor things shouldn't be making functional changes
> (like adding dma_map_resources()).

We will split the patch into multiple patches in v2.

> 
> 
> > -static int perf_run_test(struct perf_thread *pthr)
> > +static int perf_run_test_cpu(struct perf_thread *pthr)
> >  {
> >  	struct perf_peer *peer = pthr->perf->test_peer;
> >  	struct perf_ctx *perf = pthr->perf;
> > @@ -914,7 +903,7 @@ static int perf_run_test(struct perf_thread *pthr)
> >
> >  	/* Copied field is cleared on test launch stage */
> >  	while (pthr->copied < total_size) {
> > -		ret = perf_copy_chunk(pthr, flt_dst, flt_src, chunk_size);
> > +		ret = perf_copy_chunk_cpu(pthr, flt_dst, flt_src,
> chunk_size);
> >  		if (ret) {
> >  			dev_err(&perf->ntb->dev, "%d: Got error %d on
> test\n",
> >  				pthr->tidx, ret);
> > @@ -937,6 +926,74 @@ static int perf_run_test(struct perf_thread *pthr)
> >  	return 0;
> >  }
> >
> > +static int perf_run_test_dma(struct perf_thread *pthr)
> > +{
> > +	struct perf_peer *peer = pthr->perf->test_peer;
> > +	struct perf_ctx *perf = pthr->perf;
> > +	struct device *dma_dev;
> > +	dma_addr_t flt_dst, bnd_dst;
> > +	u64 total_size, chunk_size;
> > +	void *flt_src;
> > +	int ret = 0;
> > +
> > +	total_size = 1ULL << total_order;
> > +	chunk_size = 1ULL << chunk_order;
> > +	chunk_size = min_t(u64, peer->outbuf_size, chunk_size);
> > +
> > +	/* Map MW for DMA */
> > +	dma_dev = pthr->dma_chan->device->dev;
> > +	peer->outbuf_dma_addr = dma_map_resource(dma_dev,
> > +						 peer->outbuf_phys_addr,
> > +						 peer->outbuf_size,
> > +						 DMA_FROM_DEVICE, 0);
> > +	if (dma_mapping_error(dma_dev, peer->outbuf_dma_addr)) {
> > +		dma_unmap_resource(dma_dev, peer->outbuf_dma_addr,
> > +				   peer->outbuf_size, DMA_FROM_DEVICE,
> 0);
> > +		return -EIO;
> > +	}
> > +
> > +	flt_src = pthr->src;
> > +	bnd_dst = peer->outbuf_dma_addr + peer->outbuf_size;
> > +	flt_dst = peer->outbuf_dma_addr;
> > +
> > +	pthr->duration = ktime_get();
> > +	/* Copied field is cleared on test launch stage */
> > +	while (pthr->copied < total_size) {
> > +		ret = perf_copy_chunk_dma(pthr, flt_dst, flt_src,
> chunk_size);
> > +		if (ret) {
> > +			dev_err(&perf->ntb->dev, "%d: Got error %d on
> test\n",
> > +				pthr->tidx, ret);
> > +			return ret;
> > +		}
> > +
> 
> Honestly, this doesn't seem like a good approach to me. Duplicating the
> majority of the perf_run_test() function is making the code more
> complicated and harder to maintain.
> 
> You should be able to just selectively call dma_map_resources() in
> perf_run_test(), or even in perf_setup_peer_mw() without needing to add
> so much extra duplicate code.

Will be taken care of in v2. Thanks for the suggestions.

Arindam

> 
> Logan

  reply	other threads:[~2020-03-11 17:44 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10 20:54 [PATCH v2 0/5] ntb perf, ntb tool and ntb-hw improvements Sanjay R Mehta
2020-03-10 20:54 ` [PATCH v2 1/5] ntb_perf: refactor code for CPU and DMA transfers Sanjay R Mehta
2020-03-10 21:21   ` Logan Gunthorpe
2020-03-11 17:44     ` Nath, Arindam [this message]
2020-03-10 20:54 ` [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN Sanjay R Mehta
2020-03-10 21:31   ` Logan Gunthorpe
2020-03-11 18:11     ` Nath, Arindam
2020-03-11 18:47       ` Logan Gunthorpe
2020-03-11 18:58         ` Nath, Arindam
2020-03-10 20:54 ` [PATCH v2 3/5] ntb_perf: pass correct struct device to dma_alloc_coherent Sanjay R Mehta
2020-03-10 20:54 ` [PATCH v2 4/5] ntb_tool: " Sanjay R Mehta
2020-03-10 20:54 ` [PATCH v2 5/5] ntb: hw: remove the code that sets the DMA mask Sanjay R Mehta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN2PR12MB3232C36B88F3667D89FFEA929CFC0@MN2PR12MB3232.namprd12.prod.outlook.com \
    --to=arindam.nath@amd.com \
    --cc=Sanju.Mehta@amd.com \
    --cc=Shyam-sundar.S-k@amd.com \
    --cc=allenbh@gmail.com \
    --cc=dave.jiang@intel.com \
    --cc=jdmason@kudzu.us \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-ntb@googlegroups.com \
    --cc=logang@deltatee.com \
    --subject='RE: [PATCH v2 1/5] ntb_perf: refactor code for CPU and DMA transfers' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).