LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Ulf Hansson <ulf.hansson@linaro.org>
To: Renius Chen <reniuschengl@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>,
	linux-mmc <linux-mmc@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Ben Chuang <Ben.Chuang@genesyslogic.com.tw>
Subject: Re: [PATCH] [v2] mmc: sdhci-pci-gli: Improve Random 4K Read Performance of GL9763E
Date: Tue, 6 Jul 2021 11:16:17 +0200	[thread overview]
Message-ID: <CAPDyKFq0yHxX7wb4XGeiMiSGGiOf8RKJ5ahhFQ+_vodqnyPV9Q@mail.gmail.com> (raw)
In-Reply-To: <CAJU4x8srB7skGFVcj1SPrzEZSnVkwKiW3OPN0GQxvgtRG7GAAQ@mail.gmail.com>

On Mon, 5 Jul 2021 at 17:09, Renius Chen <reniuschengl@gmail.com> wrote:
>
> Ulf Hansson <ulf.hansson@linaro.org> 於 2021年7月5日 週一 下午8:51寫道:
> >
> > On Mon, 5 Jul 2021 at 12:59, Renius Chen <reniuschengl@gmail.com> wrote:
> > >
> > > Ulf Hansson <ulf.hansson@linaro.org> 於 2021年7月5日 週一 下午6:03寫道:
> > > >
> > > > On Mon, 5 Jul 2021 at 11:00, Renius Chen <reniuschengl@gmail.com> wrote:
> > > > >
> > > > > During a sequence of random 4K read operations, the performance will be
> > > > > reduced due to spending much time on entering/exiting the low power state
> > > > > between requests. We disable the low power state negotiation of GL9763E
> > > > > during a sequence of random 4K read operations to improve the performance
> > > > > and enable it again after the operations have finished.
> > > > >
> > > > > Signed-off-by: Renius Chen <reniuschengl@gmail.com>
> > > > > ---
> > > > >  drivers/mmc/host/sdhci-pci-gli.c | 68 ++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 68 insertions(+)
> > > > >
> > > > > diff --git a/drivers/mmc/host/sdhci-pci-gli.c b/drivers/mmc/host/sdhci-pci-gli.c
> > > > > index 302a7579a9b3..5f1f332b4241 100644
> > > > > --- a/drivers/mmc/host/sdhci-pci-gli.c
> > > > > +++ b/drivers/mmc/host/sdhci-pci-gli.c
> > > > > @@ -88,6 +88,9 @@
> > > > >  #define PCIE_GLI_9763E_SCR      0x8E0
> > > > >  #define   GLI_9763E_SCR_AXI_REQ           BIT(9)
> > > > >
> > > > > +#define PCIE_GLI_9763E_CFG       0x8A0
> > > > > +#define   GLI_9763E_CFG_LPSN_DIS   BIT(12)
> > > > > +
> > > > >  #define PCIE_GLI_9763E_CFG2      0x8A4
> > > > >  #define   GLI_9763E_CFG2_L1DLY     GENMASK(28, 19)
> > > > >  #define   GLI_9763E_CFG2_L1DLY_MID 0x54
> > > > > @@ -128,6 +131,11 @@
> > > > >
> > > > >  #define GLI_MAX_TUNING_LOOP 40
> > > > >
> > > > > +struct gli_host {
> > > > > +       bool start_4k_r;
> > > > > +       int continuous_4k_r;
> > > > > +};
> > > > > +
> > > > >  /* Genesys Logic chipset */
> > > > >  static inline void gl9750_wt_on(struct sdhci_host *host)
> > > > >  {
> > > > > @@ -691,6 +699,62 @@ static void sdhci_gl9763e_dumpregs(struct mmc_host *mmc)
> > > > >         sdhci_dumpregs(mmc_priv(mmc));
> > > > >  }
> > > > >
> > > > > +static void gl9763e_set_low_power_negotiation(struct sdhci_pci_slot *slot, bool enable)
> > > > > +{
> > > > > +       struct pci_dev *pdev = slot->chip->pdev;
> > > > > +       u32 value;
> > > > > +
> > > > > +       pci_read_config_dword(pdev, PCIE_GLI_9763E_VHS, &value);
> > > > > +       value &= ~GLI_9763E_VHS_REV;
> > > > > +       value |= FIELD_PREP(GLI_9763E_VHS_REV, GLI_9763E_VHS_REV_W);
> > > > > +       pci_write_config_dword(pdev, PCIE_GLI_9763E_VHS, value);
> > > > > +
> > > > > +       pci_read_config_dword(pdev, PCIE_GLI_9763E_CFG, &value);
> > > > > +
> > > > > +       if (enable)
> > > > > +               value &= ~GLI_9763E_CFG_LPSN_DIS;
> > > > > +       else
> > > > > +               value |= GLI_9763E_CFG_LPSN_DIS;
> > > > > +
> > > > > +       pci_write_config_dword(pdev, PCIE_GLI_9763E_CFG, value);
> > > > > +
> > > > > +       pci_read_config_dword(pdev, PCIE_GLI_9763E_VHS, &value);
> > > > > +       value &= ~GLI_9763E_VHS_REV;
> > > > > +       value |= FIELD_PREP(GLI_9763E_VHS_REV, GLI_9763E_VHS_REV_R);
> > > > > +       pci_write_config_dword(pdev, PCIE_GLI_9763E_VHS, value);
> > > > > +}
> > > > > +
> > > > > +static void gl9763e_request(struct mmc_host *mmc, struct mmc_request *mrq)
> > > > > +{
> > > > > +       struct sdhci_host *host = mmc_priv(mmc);
> > > > > +       struct mmc_command *cmd;
> > > > > +       struct sdhci_pci_slot *slot = sdhci_priv(host);
> > > > > +       struct gli_host *gli_host = sdhci_pci_priv(slot);
> > > > > +
> > > > > +       cmd = mrq->cmd;
> > > > > +
> > > > > +       if (cmd && (cmd->opcode == MMC_READ_MULTIPLE_BLOCK) && (cmd->data->blocks == 8)) {
> > > > > +               gli_host->continuous_4k_r++;
> > > > > +
> > > > > +               if ((!gli_host->start_4k_r) && (gli_host->continuous_4k_r >= 3)) {
> > > > > +                       gl9763e_set_low_power_negotiation(slot, false);
> > > > > +
> > > > > +                       gli_host->start_4k_r = true;
> > > > > +               }
> > > > > +       } else {
> > > > > +               gli_host->continuous_4k_r = 0;
> > > > > +
> > > > > +               if (gli_host->start_4k_r)       {
> > > > > +                       gl9763e_set_low_power_negotiation(slot, true);
> > > > > +
> > > > > +                       gli_host->start_4k_r = false;
> > > > > +               }
> > > > > +       }
> > > >
> > > > The above code is trying to figure out what kind of storage use case
> > > > that is running, based on information about the buffers. This does not
> > > > work, simply because the buffers don't give you all the information
> > > > you need to make the right decisions.
> > > >
> > > > Moreover, I am sure you would try to follow up with additional changes
> > > > on top, trying to tweak the behaviour to fit another use case - and so
> > > > on. My point is, this code doesn't belong in the lowest layer drivers.
> > > >
> > > > To move forward, I suggest you explore using runtime PM in combination
> > > > with dev PM qos. In this way, the driver could implement a default
> > > > behaviour, which can be tweaked from upper layer governors for
> > > > example, but also from user space (via sysfs) allowing more
> > > > flexibility and potentially support for various more use cases.
> > > >
> > >
> > > Hi Ulf,
> > >
> > > Thanks for advice.
> > >
> > > But we'll meet the performance issue only during a seqence of requests
> > > of read commands with 4K data length.
> > >
> > > So what we have to do is looking into the requests to monitor such
> > > behaviors and disable the low power state negotiation of GL9763e. And
> > > the information from the request buffer is sufficient for this
> > > purpose.
> > >
> > > We don't even care about if we disable the low power state negotiation
> > > by a wrong decision because we'll enable it again by any requests
> > > which are not read commands or their data length is not 4K. Disabling
> > > the low power state negotiation of GL9763e not only has no side
> > > effects but also helps its performance.
> > >
> > > The behavior is only about the low power state negotiation of GL9763e
> > > and 4K reads, and not related to runtime PM, so that we monitor the
> > > requests and implement it in the driver of GL9763e.
> >
> > I don't agree, sorry.
> >
> > The request doesn't tell you about the behavior/performance of the
> > eMMC/SD card. You can have some average idea, but things vary
> > depending on what eMMC/SD card that is being used - and over time when
> > the card gets used, for example.
> >
> > But, let's not discuss use cases and exactly how to tune the behavior,
> > that's a separate discussion.
> >
> > To repeat what I said, my main point is that this kind of code doesn't
> > belong in the driver. Instead, please try using runtime PM and dev PM
> > Qos.
> >
> > A rather simple attempt would be to deploy runtime PM support and play
> > with a default autosuspend timeout instead. Would that work for you?
> >
>
> Hi Ulf,
>
>
> Thanks for your explanation.
>
> I think there may be some misunderstandings here.

I fully understand what you want to do.

>
> Our purpose is to avoid our GL9763e from entering ASPM L1 state during
> a sequence of 4K read requests. So we don't have to consider about the
> behavior/performance of the eMMC/SD card and what eMMC/SD card that is
> being used. We just need to know what kind of requests we are
> receiving now from the PCIe root port.
>
> Besides, the APSM L1 is purely hardware behavior in GL9763e and has no
> corresponding relationship with runtime PM. It's not activated by
> driver and the behaviors are not handled by software. I think runtime
> PM is used to handle the behaviors of D0/D3 of the device, but not the
> link status of ASPM L0s, L1, etc.

Maybe runtime PM isn't the perfect fit for this type of use case.

That still doesn't matter to to me, I will not accept this kind of
governor/policy based code for use cases, in drivers. It doesn't
belong there.

>
> I agree that the policy of balancing performance vs the energy cost is
> a generic problem that all mmc drivers share. But our driver of
> GL9763e is a host driver, the setting in this patch is also only for
> GL9763e, could not be used by other devices. It depends on our
> specific hardware design so that it is not a generic solution or
> policy. So I think to implement such a patch in our specific GL9763e
> driver to execute the specific actions just for our hardware design is
> reasonable.

From the use case point of view, the GL9763e hardware design isn't at
all specific.

In many cases, controllers/platforms have support for low power states
that one want to enter to avoid wasting energy. The difficult part is
to know *when* it makes sense to enter a low power state, as it also
introduces a latency when the power needs to be restored for the
device, to allow it to serve a new request.

To me, it sounds like you may have been too aggressive on avoid
wasting energy. If I understand correctly the idle period you use is
20/21 us, while most other drivers use 50-100 ms as idle period.

[...]

Kind regards
Uffe

  reply	other threads:[~2021-07-06  9:16 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-05  9:00 Renius Chen
2021-07-05 10:02 ` Ulf Hansson
2021-07-05 10:59   ` Renius Chen
2021-07-05 12:50     ` Ulf Hansson
2021-07-05 15:09       ` Renius Chen
2021-07-06  9:16         ` Ulf Hansson [this message]
2021-07-06  9:54           ` Renius Chen
2021-07-06 10:08             ` Ulf Hansson
2021-07-06 10:56               ` Renius Chen
2021-07-07 12:15                 ` Ulf Hansson
2021-07-07 13:49                   ` Renius Chen
2021-07-14  2:15                     ` Renius Chen
2021-07-16 10:27                       ` Adrian Hunter
2021-07-19  9:26                         ` Renius Chen
2021-08-04  6:27                           ` Adrian Hunter
2021-08-10  4:23                             ` Renius Chen
2021-08-17 10:30                               ` Renius Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPDyKFq0yHxX7wb4XGeiMiSGGiOf8RKJ5ahhFQ+_vodqnyPV9Q@mail.gmail.com \
    --to=ulf.hansson@linaro.org \
    --cc=Ben.Chuang@genesyslogic.com.tw \
    --cc=adrian.hunter@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=reniuschengl@gmail.com \
    --subject='Re: [PATCH] [v2] mmc: sdhci-pci-gli: Improve Random 4K Read Performance of GL9763E' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).