LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Nath, Arindam" <Arindam.Nath@amd.com>
To: Logan Gunthorpe <logang@deltatee.com>,
	"Mehta, Sanju" <Sanju.Mehta@amd.com>,
	"jdmason@kudzu.us" <jdmason@kudzu.us>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"allenbh@gmail.com" <allenbh@gmail.com>,
	"S-k, Shyam-sundar" <Shyam-sundar.S-k@amd.com>
Cc: "linux-ntb@googlegroups.com" <linux-ntb@googlegroups.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN
Date: Wed, 11 Mar 2020 18:58:46 +0000	[thread overview]
Message-ID: <MN2PR12MB323290836319B3E7032D91369CFC0@MN2PR12MB3232.namprd12.prod.outlook.com> (raw)
In-Reply-To: <214f3ef3-853d-6b0d-0fed-5bb6c1f1de1f@deltatee.com>

> -----Original Message-----
> From: Logan Gunthorpe <logang@deltatee.com>
> Sent: Thursday, March 12, 2020 00:17
> To: Nath, Arindam <Arindam.Nath@amd.com>; Mehta, Sanju
> <Sanju.Mehta@amd.com>; jdmason@kudzu.us; dave.jiang@intel.com;
> allenbh@gmail.com; S-k, Shyam-sundar <Shyam-sundar.S-k@amd.com>
> Cc: linux-ntb@googlegroups.com; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN
> 
> 
> 
> On 2020-03-11 12:11 p.m., Nath, Arindam wrote:
> >> -----Original Message-----
> >> From: Logan Gunthorpe <logang@deltatee.com>
> >> Sent: Wednesday, March 11, 2020 03:01
> >> To: Mehta, Sanju <Sanju.Mehta@amd.com>; jdmason@kudzu.us;
> >> dave.jiang@intel.com; allenbh@gmail.com; Nath, Arindam
> >> <Arindam.Nath@amd.com>; S-k, Shyam-sundar <Shyam-sundar.S-
> >> k@amd.com>
> >> Cc: linux-ntb@googlegroups.com; linux-kernel@vger.kernel.org
> >> Subject: Re: [PATCH v2 2/5] ntb_perf: send command in response to
> EAGAIN
> >>
> >>
> >>
> >> On 2020-03-10 2:54 p.m., Sanjay R Mehta wrote:
> >>> From: Arindam Nath <arindam.nath@amd.com>
> >>>
> >>> perf_spad_cmd_send() and perf_msg_cmd_send() return
> >>> -EAGAIN after trying to send commands for a maximum
> >>> of MSG_TRIES re-tries. But currently there is no
> >>> handling for this error. These functions are invoked
> >>> from perf_service_work() through function pointers,
> >>> so rather than simply call these functions is not
> >>> enough. We need to make sure to invoke them again in
> >>> case of -EAGAIN. Since peer status bits were cleared
> >>> before calling these functions, we set the same status
> >>> bits before queueing the work again for later invocation.
> >>> This way we simply won't go ahead and initialize the
> >>> XLAT registers wrongfully in case sending the very first
> >>> command itself fails.
> >>
> >> So what happens if there's an actual non-recoverable error that causes
> >> perf_msg_cmd_send() to fail? Are you proposing it just requeues high
> >> priority work forever?
> >
> > The intent of the patch is to handle -EAGAIN, since the error code is
> > an indication that we need to try again later. Currently there is a very
> > small time frame during which ntb_perf should be loaded on both sides
> > (primary and secondary) to have XLAT registers configured correctly.
> > Failing that the code will still fall through without properly initializing the
> > XLAT registers and there is no indication of that either until we have
> > actually tried to perform 'echo 0 > /sys/kernel/debug/.../run'.
> >
> > With the changes proposed in this patch, we do not have to depend
> > on whether the drivers at both ends are loaded within a fixed time
> > duration. So we can simply load the driver at one side, and at a later
> > time load the driver on the other, and still the XLAT registers would
> > be set up correctly.
> >
> > Looking at perf_spad_cmd_send() and perf_msg_cmd_send(), if the
> > concern is that ntb_peer_spad_read()/ntb_msg_read_sts() fail because
> > of some non-recoverable error and we still schedule a high priority
> > work, that is a valid concern. But isn't it still better than simply falling
> > through and initializing XLAT register with incorrect values?
> 
> I don't think it's ever acceptable to get into an infinite loop.
> Especially when you're running on the system's high priority work queue...
> 
> At the very least schedule a delayed work item to try again in some
> number of seconds or something. Essentially just have more retires,
> perhaps with longer delays in between.

Sounds like a good idea. Thanks for the suggestion.

Arindam

> 
> Falling through and continuing with the wrong values is certainly wrong.
> I didn't notice that. If an error occurs, it shouldn't continue, it
> should print an error to dmesg and stop.
> 
> >
> >>
> >> I never really reviewed this stuff properly but it looks like it has a
> >> bunch of problems. Using the high priority work queue for some low
> >> priority setup work seems wrong, at the very least. The spad and msg
> >> send loops also look like they have a bunch of inter-host race condition
> >> problems as well. Yikes.
> >
> > I am not very sure what the design considerations were when having
> > a high priority work queue, but perhaps we can all have a discussion
> > on this.
> 
> I'd change it. Seems completely wrong to me.
> 
> Logan

  reply	other threads:[~2020-03-11 18:58 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10 20:54 [PATCH v2 0/5] ntb perf, ntb tool and ntb-hw improvements Sanjay R Mehta
2020-03-10 20:54 ` [PATCH v2 1/5] ntb_perf: refactor code for CPU and DMA transfers Sanjay R Mehta
2020-03-10 21:21   ` Logan Gunthorpe
2020-03-11 17:44     ` Nath, Arindam
2020-03-10 20:54 ` [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN Sanjay R Mehta
2020-03-10 21:31   ` Logan Gunthorpe
2020-03-11 18:11     ` Nath, Arindam
2020-03-11 18:47       ` Logan Gunthorpe
2020-03-11 18:58         ` Nath, Arindam [this message]
2020-03-10 20:54 ` [PATCH v2 3/5] ntb_perf: pass correct struct device to dma_alloc_coherent Sanjay R Mehta
2020-03-10 20:54 ` [PATCH v2 4/5] ntb_tool: " Sanjay R Mehta
2020-03-10 20:54 ` [PATCH v2 5/5] ntb: hw: remove the code that sets the DMA mask Sanjay R Mehta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN2PR12MB323290836319B3E7032D91369CFC0@MN2PR12MB3232.namprd12.prod.outlook.com \
    --to=arindam.nath@amd.com \
    --cc=Sanju.Mehta@amd.com \
    --cc=Shyam-sundar.S-k@amd.com \
    --cc=allenbh@gmail.com \
    --cc=dave.jiang@intel.com \
    --cc=jdmason@kudzu.us \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-ntb@googlegroups.com \
    --cc=logang@deltatee.com \
    --subject='RE: [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).