LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v3 0/3] Allow custom PCI resource alignment on pseries
@ 2019-05-28  4:03 Shawn Anastasio
  2019-05-28  4:03 ` [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request Shawn Anastasio
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Shawn Anastasio @ 2019-05-28  4:03 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: bhelgaas, benh, paulus, mpe, sbobroff, xyjxie, rppt, linux-kernel

Changes from v2 to v3:
  - Fix wrong return type of ppc pcibios_ignore_alignment_request
    (Not sure how my local compile didn't catch that!)

Hello all,

This patch set implements support for user-specified PCI resource
alignment on the pseries platform for hotplugged PCI devices.
Currently on pseries, PCI resource alignments specified with the
pci=resource_alignment commandline argument are ignored, since
the firmware is in charge of managing the PCI resources. In the
case of hotplugged devices, though, the kernel is in charge of 
configuring the resources and should obey alignment requirements.

The current behavior of ignoring the alignment for hotplugged devices
results in sub-page BARs landing between page boundaries and
becoming un-mappable from userspace via the VFIO framework.
This issue was observed on a pseries KVM guest with hotplugged
ivshmem devices.
 
With these changes, users can specify an appropriate
pci=resource_alignment argument on boot for devices they wish to use 
with VFIO.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is done
on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned").

Feedback is appreciated.

Thanks,
Shawn

Shawn Anastasio (3):
  PCI: Introduce pcibios_ignore_alignment_request
  powerpc/64: Enable pcibios_after_init hook on ppc64
  powerpc/pseries: Allow user-specified PCI resource alignment after
    init

 arch/powerpc/include/asm/machdep.h     |  6 ++++--
 arch/powerpc/kernel/pci-common.c       |  9 +++++++++
 arch/powerpc/kernel/pci_64.c           |  4 ++++
 arch/powerpc/platforms/pseries/setup.c | 22 ++++++++++++++++++++++
 drivers/pci/pci.c                      |  9 +++++++--
 include/linux/pci.h                    |  1 +
 6 files changed, 47 insertions(+), 4 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
  2019-05-28  4:03 [PATCH v3 0/3] Allow custom PCI resource alignment on pseries Shawn Anastasio
@ 2019-05-28  4:03 ` Shawn Anastasio
  2019-05-28  5:36   ` Oliver
  2019-05-28  4:03 ` [PATCH v3 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64 Shawn Anastasio
  2019-05-28  4:03 ` [PATCH v3 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init Shawn Anastasio
  2 siblings, 1 reply; 19+ messages in thread
From: Shawn Anastasio @ 2019-05-28  4:03 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: bhelgaas, benh, paulus, mpe, sbobroff, xyjxie, rppt, linux-kernel

Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio <shawn@anastas.io>
---
 drivers/pci/pci.c   | 9 +++++++--
 include/linux/pci.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
 	return 0;
 }
 
+int __weak pcibios_ignore_alignment_request(void)
+{
+	return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
 static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
 static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
 	p = resource_alignment_param;
 	if (!*p && !align)
 		goto out;
-	if (pci_has_flag(PCI_PROBE_ONLY)) {
+	if (pcibios_ignore_alignment_request()) {
 		align = 0;
-		pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n");
+		pr_info_once("PCI: Ignoring requested alignments\n");
 		goto out;
 	}
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4a5a84d7bdd4..47471dcdbaf9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int active) {}
 int pcibios_alloc_irq(struct pci_dev *dev);
 void pcibios_free_irq(struct pci_dev *dev);
 resource_size_t pcibios_default_alignment(void);
+int pcibios_ignore_alignment_request(void);
 
 #ifdef CONFIG_HIBERNATE_CALLBACKS
 extern struct dev_pm_ops pcibios_pm_ops;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64
  2019-05-28  4:03 [PATCH v3 0/3] Allow custom PCI resource alignment on pseries Shawn Anastasio
  2019-05-28  4:03 ` [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request Shawn Anastasio
@ 2019-05-28  4:03 ` Shawn Anastasio
  2019-05-28  4:03 ` [PATCH v3 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init Shawn Anastasio
  2 siblings, 0 replies; 19+ messages in thread
From: Shawn Anastasio @ 2019-05-28  4:03 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: bhelgaas, benh, paulus, mpe, sbobroff, xyjxie, rppt, linux-kernel

Enable the pcibios_after_init hook on all powerpc platforms.
This hook is executed at the end of pcibios_init and was previously
only available on CONFIG_PPC32.

Since it is useful and not inherently limited to 32-bit mode,
remove the limitation and allow it on all powerpc platforms.

Signed-off-by: Shawn Anastasio <shawn@anastas.io>
---
 arch/powerpc/include/asm/machdep.h | 3 +--
 arch/powerpc/kernel/pci_64.c       | 4 ++++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 2f0ca6560e47..2fbfaa9176ed 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -150,6 +150,7 @@ struct machdep_calls {
 	void		(*init)(void);
 
 	void		(*kgdb_map_scc)(void);
+#endif /* CONFIG_PPC32 */
 
 	/*
 	 * optional PCI "hooks"
@@ -157,8 +158,6 @@ struct machdep_calls {
 	/* Called at then very end of pcibios_init() */
 	void (*pcibios_after_init)(void);
 
-#endif /* CONFIG_PPC32 */
-
 	/* Called in indirect_* to avoid touching devices */
 	int (*pci_exclude_device)(struct pci_controller *, unsigned char, unsigned char);
 
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 9d8c10d55407..fba7fe6e4a50 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -68,6 +68,10 @@ static int __init pcibios_init(void)
 
 	printk(KERN_DEBUG "PCI: Probing PCI hardware done\n");
 
+	/* Call machine dependent post-init code */
+	if (ppc_md.pcibios_after_init)
+		ppc_md.pcibios_after_init();
+
 	return 0;
 }
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init
  2019-05-28  4:03 [PATCH v3 0/3] Allow custom PCI resource alignment on pseries Shawn Anastasio
  2019-05-28  4:03 ` [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request Shawn Anastasio
  2019-05-28  4:03 ` [PATCH v3 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64 Shawn Anastasio
@ 2019-05-28  4:03 ` Shawn Anastasio
  2019-05-29 14:02   ` Bjorn Helgaas
  2 siblings, 1 reply; 19+ messages in thread
From: Shawn Anastasio @ 2019-05-28  4:03 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: bhelgaas, benh, paulus, mpe, sbobroff, xyjxie, rppt, linux-kernel

On pseries, custom PCI resource alignment specified with the commandline
argument pci=resource_alignment is disabled due to PCI resources being
managed by the firmware. However, in the case of PCI hotplug the
resources are managed by the kernel, so custom alignments should be
honored in these cases. This is done by only honoring custom
alignments after initial PCI initialization is done, to ensure that
all devices managed by the firmware are excluded.

Without this ability, sub-page BARs sometimes get mapped in between
page boundaries for hotplugged devices and are therefore unusable
with the VFIO framework. This change allows users to request
page alignment for devices they wish to access via VFIO using
the pci=resource_alignment commandline argument.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is
done on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned")

Signed-off-by: Shawn Anastasio <shawn@anastas.io>
---
 arch/powerpc/include/asm/machdep.h     |  3 +++
 arch/powerpc/kernel/pci-common.c       |  9 +++++++++
 arch/powerpc/platforms/pseries/setup.c | 22 ++++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 2fbfaa9176ed..46eb62c0954e 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -179,6 +179,9 @@ struct machdep_calls {
 
 	resource_size_t (*pcibios_default_alignment)(void);
 
+	/* Called when determining PCI resource alignment */
+	int (*pcibios_ignore_alignment_request)(void);
+
 #ifdef CONFIG_PCI_IOV
 	void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
 	resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int resno);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index ff4b7539cbdf..8e0d73b4c188 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void)
 	return 0;
 }
 
+int pcibios_ignore_alignment_request(void)
+{
+	if (ppc_md.pcibios_ignore_alignment_request)
+		return ppc_md.pcibios_ignore_alignment_request();
+
+	/* Fall back to default method of checking PCI_PROBE_ONLY */
+	return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #ifdef CONFIG_PCI_IOV
 resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
 {
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index e4f0dfd4ae33..07f03be02afe 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize);
 
 int fwnmi_active;  /* TRUE if an FWNMI handler is present */
 
+static int initial_pci_init_done; /* TRUE if initial pcibios init has completed */
+
 static void pSeries_show_cpuinfo(struct seq_file *m)
 {
 	struct device_node *root;
@@ -749,6 +751,23 @@ static resource_size_t pseries_pci_iov_resource_alignment(struct pci_dev *pdev,
 }
 #endif
 
+static void pseries_after_init(void)
+{
+	initial_pci_init_done = 1;
+}
+
+static int pseries_ignore_alignment_request(void)
+{
+	if (initial_pci_init_done)
+		/*
+		 * Allow custom alignments after init for things
+		 * like PCI hotplugging.
+		 */
+		return 0;
+
+	return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 static void __init pSeries_setup_arch(void)
 {
 	set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
@@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void)
 	}
 
 	ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
+	ppc_md.pcibios_after_init = pseries_after_init;
+	ppc_md.pcibios_ignore_alignment_request =
+		pseries_ignore_alignment_request;
 }
 
 static void pseries_panic(char *str)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
  2019-05-28  4:03 ` [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request Shawn Anastasio
@ 2019-05-28  5:36   ` Oliver
  2019-05-28  5:50     ` Shawn Anastasio
                       ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Oliver @ 2019-05-28  5:36 UTC (permalink / raw)
  To: Shawn Anastasio
  Cc: linux-pci, linuxppc-dev, Bjorn Helgaas, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Sam Bobroff, xyjxie, rppt,
	Linux Kernel Mailing List

On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io> wrote:
>
> Introduce a new pcibios function pcibios_ignore_alignment_request
> which allows the PCI core to defer to platform-specific code to
> determine whether or not to ignore alignment requests for PCI resources.
>
> The existing behavior is to simply ignore alignment requests when
> PCI_PROBE_ONLY is set. This is behavior is maintained by the
> default implementation of pcibios_ignore_alignment_request.
>
> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
> ---
>  drivers/pci/pci.c   | 9 +++++++--
>  include/linux/pci.h | 1 +
>  2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 8abc843b1615..8207a09085d1 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
>         return 0;
>  }
>
> +int __weak pcibios_ignore_alignment_request(void)
> +{
> +       return pci_has_flag(PCI_PROBE_ONLY);
> +}
> +
>  #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>  static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>  static DEFINE_SPINLOCK(resource_alignment_lock);
> @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
>         p = resource_alignment_param;
>         if (!*p && !align)
>                 goto out;
> -       if (pci_has_flag(PCI_PROBE_ONLY)) {
> +       if (pcibios_ignore_alignment_request()) {
>                 align = 0;
> -               pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n");
> +               pr_info_once("PCI: Ignoring requested alignments\n");
>                 goto out;
>         }

I think the logic here is questionable to begin with. If the user has
explicitly requested re-aligning a resource via the command line then
we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
they get to keep the pieces.

That said, the real issue here is that PCI_PROBE_ONLY probably
shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
hotplugged devices are configured by firmware before it's passed to
the guest and we need to keep the FW assignments otherwise things
break. QEMU however doesn't do any BAR assignments and relies on that
being handled by the guest. At boot time this is done by SLOF, but
Linux only keeps SLOF around until it's extracted the device-tree.
Once that's done SLOF gets blown away and the kernel needs to do it's
own BAR assignments. I'm guessing there's a hack in there to make it
work today, but it's a little surprising that it works at all...

IIRC Sam Bobroff was looking at hotplug under pseries recently so he
might have something to add. He's sick at the moment, but I'll ask him
to take a look at this once he's back among the living

> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 4a5a84d7bdd4..47471dcdbaf9 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int active) {}
>  int pcibios_alloc_irq(struct pci_dev *dev);
>  void pcibios_free_irq(struct pci_dev *dev);
>  resource_size_t pcibios_default_alignment(void);
> +int pcibios_ignore_alignment_request(void);
>
>  #ifdef CONFIG_HIBERNATE_CALLBACKS
>  extern struct dev_pm_ops pcibios_pm_ops;
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
  2019-05-28  5:36   ` Oliver
@ 2019-05-28  5:50     ` Shawn Anastasio
  2019-05-28  6:27     ` Alexey Kardashevskiy
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: Shawn Anastasio @ 2019-05-28  5:50 UTC (permalink / raw)
  To: Oliver
  Cc: Sam Bobroff, linux-pci, Linux Kernel Mailing List, rppt,
	Paul Mackerras, Bjorn Helgaas, xyjxie, linuxppc-dev



On 5/28/19 12:36 AM, Oliver wrote:
> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io> wrote:
>>
>> Introduce a new pcibios function pcibios_ignore_alignment_request
>> which allows the PCI core to defer to platform-specific code to
>> determine whether or not to ignore alignment requests for PCI resources.
>>
>> The existing behavior is to simply ignore alignment requests when
>> PCI_PROBE_ONLY is set. This is behavior is maintained by the
>> default implementation of pcibios_ignore_alignment_request.
>>
>> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
>> ---
>>   drivers/pci/pci.c   | 9 +++++++--
>>   include/linux/pci.h | 1 +
>>   2 files changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 8abc843b1615..8207a09085d1 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
>>          return 0;
>>   }
>>
>> +int __weak pcibios_ignore_alignment_request(void)
>> +{
>> +       return pci_has_flag(PCI_PROBE_ONLY);
>> +}
>> +
>>   #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>>   static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>>   static DEFINE_SPINLOCK(resource_alignment_lock);
>> @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
>>          p = resource_alignment_param;
>>          if (!*p && !align)
>>                  goto out;
>> -       if (pci_has_flag(PCI_PROBE_ONLY)) {
>> +       if (pcibios_ignore_alignment_request()) {
>>                  align = 0;
>> -               pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n");
>> +               pr_info_once("PCI: Ignoring requested alignments\n");
>>                  goto out;
>>          }
> 
> I think the logic here is questionable to begin with. If the user has
> explicitly requested re-aligning a resource via the command line then
> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
> they get to keep the pieces.
> 
> That said, the real issue here is that PCI_PROBE_ONLY probably
> shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
> hotplugged devices are configured by firmware before it's passed to
> the guest and we need to keep the FW assignments otherwise things
> break. QEMU however doesn't do any BAR assignments and relies on that
> being handled by the guest. At boot time this is done by SLOF, but
> Linux only keeps SLOF around until it's extracted the device-tree.
> Once that's done SLOF gets blown away and the kernel needs to do it's
> own BAR assignments. I'm guessing there's a hack in there to make it
> work today, but it's a little surprising that it works at all...
Interesting, I wasn't aware that hotplugged devices are configured
by the hypervisor on PowerVM. That at least means that this patch is
wrong as-is since it won't handle that properly. Definitely seems like
there will need to be different behavior here depending on the hypervisor.

That being said, wouldn't PCI_PROBE_ONLY still be set on pseries/kvm
(at least for initial boot) to observe SLOF's original BAR assignments?
Perhaps it should be un-set after initial PCI init?

> 
> IIRC Sam Bobroff was looking at hotplug under pseries recently so he
> might have something to add. He's sick at the moment, but I'll ask him
> to take a look at this once he's back among the living

Good to know. I'll await his comments before continuing here.

>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 4a5a84d7bdd4..47471dcdbaf9 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int active) {}
>>   int pcibios_alloc_irq(struct pci_dev *dev);
>>   void pcibios_free_irq(struct pci_dev *dev);
>>   resource_size_t pcibios_default_alignment(void);
>> +int pcibios_ignore_alignment_request(void);
>>
>>   #ifdef CONFIG_HIBERNATE_CALLBACKS
>>   extern struct dev_pm_ops pcibios_pm_ops;
>> --
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
  2019-05-28  5:36   ` Oliver
  2019-05-28  5:50     ` Shawn Anastasio
@ 2019-05-28  6:27     ` Alexey Kardashevskiy
  2019-05-28  7:39       ` Shawn Anastasio
  2019-05-29 14:00     ` Bjorn Helgaas
  2019-05-30  6:55     ` [EXTERNAL] " Sam Bobroff
  3 siblings, 1 reply; 19+ messages in thread
From: Alexey Kardashevskiy @ 2019-05-28  6:27 UTC (permalink / raw)
  To: Oliver, Shawn Anastasio
  Cc: linux-pci, linuxppc-dev, Bjorn Helgaas, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Sam Bobroff, xyjxie, rppt,
	Linux Kernel Mailing List



On 28/05/2019 15:36, Oliver wrote:
> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io> wrote:
>>
>> Introduce a new pcibios function pcibios_ignore_alignment_request
>> which allows the PCI core to defer to platform-specific code to
>> determine whether or not to ignore alignment requests for PCI resources.
>>
>> The existing behavior is to simply ignore alignment requests when
>> PCI_PROBE_ONLY is set. This is behavior is maintained by the
>> default implementation of pcibios_ignore_alignment_request.
>>
>> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
>> ---
>>  drivers/pci/pci.c   | 9 +++++++--
>>  include/linux/pci.h | 1 +
>>  2 files changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 8abc843b1615..8207a09085d1 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
>>         return 0;
>>  }
>>
>> +int __weak pcibios_ignore_alignment_request(void)
>> +{
>> +       return pci_has_flag(PCI_PROBE_ONLY);
>> +}
>> +
>>  #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>>  static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>>  static DEFINE_SPINLOCK(resource_alignment_lock);
>> @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
>>         p = resource_alignment_param;
>>         if (!*p && !align)
>>                 goto out;
>> -       if (pci_has_flag(PCI_PROBE_ONLY)) {
>> +       if (pcibios_ignore_alignment_request()) {
>>                 align = 0;
>> -               pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n");
>> +               pr_info_once("PCI: Ignoring requested alignments\n");
>>                 goto out;
>>         }
> 
> I think the logic here is questionable to begin with. If the user has
> explicitly requested re-aligning a resource via the command line then
> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
> they get to keep the pieces.
> 
> That said, the real issue here is that PCI_PROBE_ONLY probably
> shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
> hotplugged devices are configured by firmware before it's passed to
> the guest and we need to keep the FW assignments otherwise things
> break. QEMU however doesn't do any BAR assignments and relies on that
> being handled by the guest. At boot time this is done by SLOF, but
> Linux only keeps SLOF around until it's extracted the device-tree.
> Once that's done SLOF gets blown away and the kernel needs to do it's
> own BAR assignments. I'm guessing there's a hack in there to make it
> work today, but it's a little surprising that it works at all...


The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
guest which receives an event from qemu (RAS_EPOW from
/proc/interrupts), fetches device tree chunks (and as I understand it -
they come with BARs from phyp but without from qemu) and writes "1" to
"/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:

[c000000006e6f960] [c0000000005f62d4] pci_assign_resource+0x44/0x360

[c000000006e6fa10] [c0000000005f8b54]
assign_requested_resources_sorted+0x84/0x110
[c000000006e6fa60] [c0000000005f9540] __assign_resources_sorted+0xd0/0x750
[c000000006e6fb40] [c0000000005fb2e0]
__pci_bus_assign_resources+0x80/0x280
[c000000006e6fc00] [c0000000005fb95c]
pci_assign_unassigned_bus_resources+0xbc/0x100
[c000000006e6fc60] [c0000000005e3d74] pci_rescan_bus+0x34/0x60

[c000000006e6fc90] [c0000000005f1ef4] rescan_store+0x84/0xc0

[c000000006e6fcd0] [c00000000068060c] bus_attr_store+0x3c/0x60

[c000000006e6fcf0] [c00000000037853c] sysfs_kf_write+0x5c/0x80





> 
> IIRC Sam Bobroff was looking at hotplug under pseries recently so he
> might have something to add. He's sick at the moment, but I'll ask him
> to take a look at this once he's back among the living
> 
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 4a5a84d7bdd4..47471dcdbaf9 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int active) {}
>>  int pcibios_alloc_irq(struct pci_dev *dev);
>>  void pcibios_free_irq(struct pci_dev *dev);
>>  resource_size_t pcibios_default_alignment(void);
>> +int pcibios_ignore_alignment_request(void);
>>
>>  #ifdef CONFIG_HIBERNATE_CALLBACKS
>>  extern struct dev_pm_ops pcibios_pm_ops;
>> --
>> 2.20.1
>>

-- 
Alexey

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
  2019-05-28  6:27     ` Alexey Kardashevskiy
@ 2019-05-28  7:39       ` Shawn Anastasio
  2019-05-30  3:39         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 19+ messages in thread
From: Shawn Anastasio @ 2019-05-28  7:39 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Oliver
  Cc: Sam Bobroff, linux-pci, Linux Kernel Mailing List, rppt,
	Paul Mackerras, Bjorn Helgaas, xyjxie, linuxppc-dev



On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 28/05/2019 15:36, Oliver wrote:
>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io> wrote:
>>>
>>> Introduce a new pcibios function pcibios_ignore_alignment_request
>>> which allows the PCI core to defer to platform-specific code to
>>> determine whether or not to ignore alignment requests for PCI resources.
>>>
>>> The existing behavior is to simply ignore alignment requests when
>>> PCI_PROBE_ONLY is set. This is behavior is maintained by the
>>> default implementation of pcibios_ignore_alignment_request.
>>>
>>> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
>>> ---
>>>   drivers/pci/pci.c   | 9 +++++++--
>>>   include/linux/pci.h | 1 +
>>>   2 files changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>> index 8abc843b1615..8207a09085d1 100644
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
>>>          return 0;
>>>   }
>>>
>>> +int __weak pcibios_ignore_alignment_request(void)
>>> +{
>>> +       return pci_has_flag(PCI_PROBE_ONLY);
>>> +}
>>> +
>>>   #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>>>   static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>>>   static DEFINE_SPINLOCK(resource_alignment_lock);
>>> @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
>>>          p = resource_alignment_param;
>>>          if (!*p && !align)
>>>                  goto out;
>>> -       if (pci_has_flag(PCI_PROBE_ONLY)) {
>>> +       if (pcibios_ignore_alignment_request()) {
>>>                  align = 0;
>>> -               pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n");
>>> +               pr_info_once("PCI: Ignoring requested alignments\n");
>>>                  goto out;
>>>          }
>>
>> I think the logic here is questionable to begin with. If the user has
>> explicitly requested re-aligning a resource via the command line then
>> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
>> they get to keep the pieces.
>>
>> That said, the real issue here is that PCI_PROBE_ONLY probably
>> shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
>> hotplugged devices are configured by firmware before it's passed to
>> the guest and we need to keep the FW assignments otherwise things
>> break. QEMU however doesn't do any BAR assignments and relies on that
>> being handled by the guest. At boot time this is done by SLOF, but
>> Linux only keeps SLOF around until it's extracted the device-tree.
>> Once that's done SLOF gets blown away and the kernel needs to do it's
>> own BAR assignments. I'm guessing there's a hack in there to make it
>> work today, but it's a little surprising that it works at all...
> 
> 
> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
> guest which receives an event from qemu (RAS_EPOW from
> /proc/interrupts), fetches device tree chunks (and as I understand it -
> they come with BARs from phyp but without from qemu) and writes "1" to
> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:

Interesting. Does this mean that the PHYP hotplug path doesn't
call pci_assign_resource? If so it means the patch may not
break that platform after all, though it still may not be
the correct way of doing things.

> 
> [c000000006e6f960] [c0000000005f62d4] pci_assign_resource+0x44/0x360
> 
> [c000000006e6fa10] [c0000000005f8b54]
> assign_requested_resources_sorted+0x84/0x110
> [c000000006e6fa60] [c0000000005f9540] __assign_resources_sorted+0xd0/0x750
> [c000000006e6fb40] [c0000000005fb2e0]
> __pci_bus_assign_resources+0x80/0x280
> [c000000006e6fc00] [c0000000005fb95c]
> pci_assign_unassigned_bus_resources+0xbc/0x100
> [c000000006e6fc60] [c0000000005e3d74] pci_rescan_bus+0x34/0x60
> 
> [c000000006e6fc90] [c0000000005f1ef4] rescan_store+0x84/0xc0
> 
> [c000000006e6fcd0] [c00000000068060c] bus_attr_store+0x3c/0x60
> 
> [c000000006e6fcf0] [c00000000037853c] sysfs_kf_write+0x5c/0x80
> 
> 
> 
> 
> 
>>
>> IIRC Sam Bobroff was looking at hotplug under pseries recently so he
>> might have something to add. He's sick at the moment, but I'll ask him
>> to take a look at this once he's back among the living
>>
>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>> index 4a5a84d7bdd4..47471dcdbaf9 100644
>>> --- a/include/linux/pci.h
>>> +++ b/include/linux/pci.h
>>> @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int active) {}
>>>   int pcibios_alloc_irq(struct pci_dev *dev);
>>>   void pcibios_free_irq(struct pci_dev *dev);
>>>   resource_size_t pcibios_default_alignment(void);
>>> +int pcibios_ignore_alignment_request(void);
>>>
>>>   #ifdef CONFIG_HIBERNATE_CALLBACKS
>>>   extern struct dev_pm_ops pcibios_pm_ops;
>>> --
>>> 2.20.1
>>>
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
  2019-05-28  5:36   ` Oliver
  2019-05-28  5:50     ` Shawn Anastasio
  2019-05-28  6:27     ` Alexey Kardashevskiy
@ 2019-05-29 14:00     ` Bjorn Helgaas
  2019-05-30  6:55     ` [EXTERNAL] " Sam Bobroff
  3 siblings, 0 replies; 19+ messages in thread
From: Bjorn Helgaas @ 2019-05-29 14:00 UTC (permalink / raw)
  To: Oliver
  Cc: Shawn Anastasio, linux-pci, linuxppc-dev, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Sam Bobroff, xyjxie, rppt,
	Linux Kernel Mailing List

On Tue, May 28, 2019 at 03:36:34PM +1000, Oliver wrote:
> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io> wrote:
> >
> > Introduce a new pcibios function pcibios_ignore_alignment_request
> > which allows the PCI core to defer to platform-specific code to
> > determine whether or not to ignore alignment requests for PCI resources.
> >
> > The existing behavior is to simply ignore alignment requests when
> > PCI_PROBE_ONLY is set. This is behavior is maintained by the
> > default implementation of pcibios_ignore_alignment_request.
> >
> > Signed-off-by: Shawn Anastasio <shawn@anastas.io>
> > ---
> >  drivers/pci/pci.c   | 9 +++++++--
> >  include/linux/pci.h | 1 +
> >  2 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 8abc843b1615..8207a09085d1 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
> >         return 0;
> >  }
> >
> > +int __weak pcibios_ignore_alignment_request(void)
> > +{
> > +       return pci_has_flag(PCI_PROBE_ONLY);
> > +}
> > +
> >  #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
> >  static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
> >  static DEFINE_SPINLOCK(resource_alignment_lock);
> > @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
> >         p = resource_alignment_param;
> >         if (!*p && !align)
> >                 goto out;
> > -       if (pci_has_flag(PCI_PROBE_ONLY)) {
> > +       if (pcibios_ignore_alignment_request()) {
> >                 align = 0;
> > -               pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n");
> > +               pr_info_once("PCI: Ignoring requested alignments\n");
> >                 goto out;
> >         }
> 
> I think the logic here is questionable to begin with. If the user has
> explicitly requested re-aligning a resource via the command line then
> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
> they get to keep the pieces.

I agree.  I don't like PCI_PROBE_ONLY in the first place.  It's a
sledgehammer approach that doesn't tell us which resource assignments
need to be preserved or why.  I'd rather use IORESOURCE_PCI_FIXED and
set it for the BARs where there's actually some sort of
hypervisor/firmware/OS dependency.

If there's a way to avoid another pciobios_*() weak function, that
would also be better.

Bjorn

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init
  2019-05-28  4:03 ` [PATCH v3 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init Shawn Anastasio
@ 2019-05-29 14:02   ` Bjorn Helgaas
  0 siblings, 0 replies; 19+ messages in thread
From: Bjorn Helgaas @ 2019-05-29 14:02 UTC (permalink / raw)
  To: Shawn Anastasio
  Cc: linux-pci, linuxppc-dev, benh, paulus, mpe, sbobroff, xyjxie,
	rppt, linux-kernel

On Mon, May 27, 2019 at 11:03:13PM -0500, Shawn Anastasio wrote:
> On pseries, custom PCI resource alignment specified with the commandline
> argument pci=resource_alignment is disabled due to PCI resources being
> managed by the firmware. However, in the case of PCI hotplug the
> resources are managed by the kernel, so custom alignments should be
> honored in these cases. This is done by only honoring custom
> alignments after initial PCI initialization is done, to ensure that
> all devices managed by the firmware are excluded.
> 
> Without this ability, sub-page BARs sometimes get mapped in between
> page boundaries for hotplugged devices and are therefore unusable
> with the VFIO framework. This change allows users to request
> page alignment for devices they wish to access via VFIO using
> the pci=resource_alignment commandline argument.
> 
> In the future, this could be extended to provide page-aligned
> resources by default for hotplugged devices, similar to what is
> done on powernv by commit 382746376993 ("powerpc/powernv: Override
> pcibios_default_alignment() to force PCI devices to be page aligned")
> 
> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
> ---
>  arch/powerpc/include/asm/machdep.h     |  3 +++
>  arch/powerpc/kernel/pci-common.c       |  9 +++++++++
>  arch/powerpc/platforms/pseries/setup.c | 22 ++++++++++++++++++++++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
> index 2fbfaa9176ed..46eb62c0954e 100644
> --- a/arch/powerpc/include/asm/machdep.h
> +++ b/arch/powerpc/include/asm/machdep.h
> @@ -179,6 +179,9 @@ struct machdep_calls {
>  
>  	resource_size_t (*pcibios_default_alignment)(void);
>  
> +	/* Called when determining PCI resource alignment */
> +	int (*pcibios_ignore_alignment_request)(void);
> +
>  #ifdef CONFIG_PCI_IOV
>  	void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
>  	resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int resno);
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index ff4b7539cbdf..8e0d73b4c188 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void)
>  	return 0;
>  }
>  
> +int pcibios_ignore_alignment_request(void)
> +{
> +	if (ppc_md.pcibios_ignore_alignment_request)
> +		return ppc_md.pcibios_ignore_alignment_request();
> +
> +	/* Fall back to default method of checking PCI_PROBE_ONLY */
> +	return pci_has_flag(PCI_PROBE_ONLY);
> +}
> +
>  #ifdef CONFIG_PCI_IOV
>  resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
>  {
> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
> index e4f0dfd4ae33..07f03be02afe 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize);
>  
>  int fwnmi_active;  /* TRUE if an FWNMI handler is present */
>  
> +static int initial_pci_init_done; /* TRUE if initial pcibios init has completed */
> +
>  static void pSeries_show_cpuinfo(struct seq_file *m)
>  {
>  	struct device_node *root;
> @@ -749,6 +751,23 @@ static resource_size_t pseries_pci_iov_resource_alignment(struct pci_dev *pdev,
>  }
>  #endif
>  
> +static void pseries_after_init(void)
> +{
> +	initial_pci_init_done = 1;
> +}
> +
> +static int pseries_ignore_alignment_request(void)
> +{
> +	if (initial_pci_init_done)
> +		/*
> +		 * Allow custom alignments after init for things
> +		 * like PCI hotplugging.
> +		 */
> +		return 0;

Hmm, if there's any way to avoid this sort of early/late flag, that
would be nicer.

> +
> +	return pci_has_flag(PCI_PROBE_ONLY);
> +}
> +
>  static void __init pSeries_setup_arch(void)
>  {
>  	set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
> @@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void)
>  	}
>  
>  	ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
> +	ppc_md.pcibios_after_init = pseries_after_init;
> +	ppc_md.pcibios_ignore_alignment_request =
> +		pseries_ignore_alignment_request;
>  }
>  
>  static void pseries_panic(char *str)
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
  2019-05-28  7:39       ` Shawn Anastasio
@ 2019-05-30  3:39         ` Alexey Kardashevskiy
  2019-05-30 22:49           ` Shawn Anastasio
  0 siblings, 1 reply; 19+ messages in thread
From: Alexey Kardashevskiy @ 2019-05-30  3:39 UTC (permalink / raw)
  To: Shawn Anastasio, Oliver
  Cc: Sam Bobroff, linux-pci, Linux Kernel Mailing List, rppt,
	Paul Mackerras, Bjorn Helgaas, xyjxie, linuxppc-dev



On 28/05/2019 17:39, Shawn Anastasio wrote:
> 
> 
> On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:
>>
>>
>> On 28/05/2019 15:36, Oliver wrote:
>>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io>
>>> wrote:
>>>>
>>>> Introduce a new pcibios function pcibios_ignore_alignment_request
>>>> which allows the PCI core to defer to platform-specific code to
>>>> determine whether or not to ignore alignment requests for PCI
>>>> resources.
>>>>
>>>> The existing behavior is to simply ignore alignment requests when
>>>> PCI_PROBE_ONLY is set. This is behavior is maintained by the
>>>> default implementation of pcibios_ignore_alignment_request.
>>>>
>>>> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
>>>> ---
>>>>   drivers/pci/pci.c   | 9 +++++++--
>>>>   include/linux/pci.h | 1 +
>>>>   2 files changed, 8 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>> index 8abc843b1615..8207a09085d1 100644
>>>> --- a/drivers/pci/pci.c
>>>> +++ b/drivers/pci/pci.c
>>>> @@ -5882,6 +5882,11 @@ resource_size_t __weak
>>>> pcibios_default_alignment(void)
>>>>          return 0;
>>>>   }
>>>>
>>>> +int __weak pcibios_ignore_alignment_request(void)
>>>> +{
>>>> +       return pci_has_flag(PCI_PROBE_ONLY);
>>>> +}
>>>> +
>>>>   #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>>>>   static char
>>>> resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>>>>   static DEFINE_SPINLOCK(resource_alignment_lock);
>>>> @@ -5906,9 +5911,9 @@ static resource_size_t
>>>> pci_specified_resource_alignment(struct pci_dev *dev,
>>>>          p = resource_alignment_param;
>>>>          if (!*p && !align)
>>>>                  goto out;
>>>> -       if (pci_has_flag(PCI_PROBE_ONLY)) {
>>>> +       if (pcibios_ignore_alignment_request()) {
>>>>                  align = 0;
>>>> -               pr_info_once("PCI: Ignoring requested alignments
>>>> (PCI_PROBE_ONLY)\n");
>>>> +               pr_info_once("PCI: Ignoring requested alignments\n");
>>>>                  goto out;
>>>>          }
>>>
>>> I think the logic here is questionable to begin with. If the user has
>>> explicitly requested re-aligning a resource via the command line then
>>> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
>>> they get to keep the pieces.
>>>
>>> That said, the real issue here is that PCI_PROBE_ONLY probably
>>> shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
>>> hotplugged devices are configured by firmware before it's passed to
>>> the guest and we need to keep the FW assignments otherwise things
>>> break. QEMU however doesn't do any BAR assignments and relies on that
>>> being handled by the guest. At boot time this is done by SLOF, but
>>> Linux only keeps SLOF around until it's extracted the device-tree.
>>> Once that's done SLOF gets blown away and the kernel needs to do it's
>>> own BAR assignments. I'm guessing there's a hack in there to make it
>>> work today, but it's a little surprising that it works at all...
>>
>>
>> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
>> guest which receives an event from qemu (RAS_EPOW from
>> /proc/interrupts), fetches device tree chunks (and as I understand it -
>> they come with BARs from phyp but without from qemu) and writes "1" to
>> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:
> 
> Interesting. Does this mean that the PHYP hotplug path doesn't
> call pci_assign_resource?


I'd expect dlpar_add_slot() to be called under phyp and eventually
pci_device_add() which (I think) may or may not trigger later reassignment.


> If so it means the patch may not
> break that platform after all, though it still may not be
> the correct way of doing things.


We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
that (unless resource_alignment= is used) the pseries guest should just
walk through all allocated resources and leave them unchanged.



>> [c000000006e6f960] [c0000000005f62d4] pci_assign_resource+0x44/0x360
>>
>> [c000000006e6fa10] [c0000000005f8b54]
>> assign_requested_resources_sorted+0x84/0x110
>> [c000000006e6fa60] [c0000000005f9540]
>> __assign_resources_sorted+0xd0/0x750
>> [c000000006e6fb40] [c0000000005fb2e0]
>> __pci_bus_assign_resources+0x80/0x280
>> [c000000006e6fc00] [c0000000005fb95c]
>> pci_assign_unassigned_bus_resources+0xbc/0x100
>> [c000000006e6fc60] [c0000000005e3d74] pci_rescan_bus+0x34/0x60
>>
>> [c000000006e6fc90] [c0000000005f1ef4] rescan_store+0x84/0xc0
>>
>> [c000000006e6fcd0] [c00000000068060c] bus_attr_store+0x3c/0x60
>>
>> [c000000006e6fcf0] [c00000000037853c] sysfs_kf_write+0x5c/0x80
>>
>>
>>
>>
>>
>>>
>>> IIRC Sam Bobroff was looking at hotplug under pseries recently so he
>>> might have something to add. He's sick at the moment, but I'll ask him
>>> to take a look at this once he's back among the living
>>>
>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>> index 4a5a84d7bdd4..47471dcdbaf9 100644
>>>> --- a/include/linux/pci.h
>>>> +++ b/include/linux/pci.h
>>>> @@ -1990,6 +1990,7 @@ static inline void
>>>> pcibios_penalize_isa_irq(int irq, int active) {}
>>>>   int pcibios_alloc_irq(struct pci_dev *dev);
>>>>   void pcibios_free_irq(struct pci_dev *dev);
>>>>   resource_size_t pcibios_default_alignment(void);
>>>> +int pcibios_ignore_alignment_request(void);
>>>>
>>>>   #ifdef CONFIG_HIBERNATE_CALLBACKS
>>>>   extern struct dev_pm_ops pcibios_pm_ops;
>>>> -- 
>>>> 2.20.1
>>>>
>>

-- 
Alexey

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
  2019-05-28  5:36   ` Oliver
                       ` (2 preceding siblings ...)
  2019-05-29 14:00     ` Bjorn Helgaas
@ 2019-05-30  6:55     ` Sam Bobroff
  2019-05-30 22:33       ` Shawn Anastasio
  3 siblings, 1 reply; 19+ messages in thread
From: Sam Bobroff @ 2019-05-30  6:55 UTC (permalink / raw)
  To: Oliver
  Cc: Shawn Anastasio, linux-pci, Linux Kernel Mailing List, rppt,
	Paul Mackerras, Bjorn Helgaas, xyjxie, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 4180 bytes --]

On Tue, May 28, 2019 at 03:36:34PM +1000, Oliver wrote:
> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io> wrote:
> >
> > Introduce a new pcibios function pcibios_ignore_alignment_request
> > which allows the PCI core to defer to platform-specific code to
> > determine whether or not to ignore alignment requests for PCI resources.
> >
> > The existing behavior is to simply ignore alignment requests when
> > PCI_PROBE_ONLY is set. This is behavior is maintained by the
> > default implementation of pcibios_ignore_alignment_request.
> >
> > Signed-off-by: Shawn Anastasio <shawn@anastas.io>
> > ---
> >  drivers/pci/pci.c   | 9 +++++++--
> >  include/linux/pci.h | 1 +
> >  2 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 8abc843b1615..8207a09085d1 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
> >         return 0;
> >  }
> >
> > +int __weak pcibios_ignore_alignment_request(void)
> > +{
> > +       return pci_has_flag(PCI_PROBE_ONLY);
> > +}
> > +
> >  #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
> >  static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
> >  static DEFINE_SPINLOCK(resource_alignment_lock);
> > @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
> >         p = resource_alignment_param;
> >         if (!*p && !align)
> >                 goto out;
> > -       if (pci_has_flag(PCI_PROBE_ONLY)) {
> > +       if (pcibios_ignore_alignment_request()) {
> >                 align = 0;
> > -               pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n");
> > +               pr_info_once("PCI: Ignoring requested alignments\n");
> >                 goto out;
> >         }
> 
> I think the logic here is questionable to begin with. If the user has
> explicitly requested re-aligning a resource via the command line then
> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
> they get to keep the pieces.
> 
> That said, the real issue here is that PCI_PROBE_ONLY probably
> shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
> hotplugged devices are configured by firmware before it's passed to
> the guest and we need to keep the FW assignments otherwise things
> break. QEMU however doesn't do any BAR assignments and relies on that
> being handled by the guest. At boot time this is done by SLOF, but
> Linux only keeps SLOF around until it's extracted the device-tree.
> Once that's done SLOF gets blown away and the kernel needs to do it's
> own BAR assignments. I'm guessing there's a hack in there to make it
> work today, but it's a little surprising that it works at all...
> 
> IIRC Sam Bobroff was looking at hotplug under pseries recently so he
> might have something to add. He's sick at the moment, but I'll ask him
> to take a look at this once he's back among the living

There seems to be some code already in the kernel that will disable
PCI_PROBE_ONLY based on a device tree property, so I did a quick test
today and it seems to work. Only a trivial tweak is needed in QEMU to
do it (have spapr_dt_chosen() add a node called "linux,pci-probe-only"
with a value of 0), and that would allow us to set it only for QEMU (and
not PowerVM) if that's what we want to do. Is that useful?

(I haven't done any real testing yet but the guest booted up OK.)

> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 4a5a84d7bdd4..47471dcdbaf9 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int active) {}
> >  int pcibios_alloc_irq(struct pci_dev *dev);
> >  void pcibios_free_irq(struct pci_dev *dev);
> >  resource_size_t pcibios_default_alignment(void);
> > +int pcibios_ignore_alignment_request(void);
> >
> >  #ifdef CONFIG_HIBERNATE_CALLBACKS
> >  extern struct dev_pm_ops pcibios_pm_ops;
> > --
> > 2.20.1
> >
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
  2019-05-30  6:55     ` [EXTERNAL] " Sam Bobroff
@ 2019-05-30 22:33       ` Shawn Anastasio
  0 siblings, 0 replies; 19+ messages in thread
From: Shawn Anastasio @ 2019-05-30 22:33 UTC (permalink / raw)
  To: Sam Bobroff, Oliver
  Cc: linux-pci, Linux Kernel Mailing List, rppt, Paul Mackerras,
	Bjorn Helgaas, xyjxie, linuxppc-dev



On 5/30/19 1:55 AM, Sam Bobroff wrote:
> On Tue, May 28, 2019 at 03:36:34PM +1000, Oliver wrote:
>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io> wrote:
>>>
>>> Introduce a new pcibios function pcibios_ignore_alignment_request
>>> which allows the PCI core to defer to platform-specific code to
>>> determine whether or not to ignore alignment requests for PCI resources.
>>>
>>> The existing behavior is to simply ignore alignment requests when
>>> PCI_PROBE_ONLY is set. This is behavior is maintained by the
>>> default implementation of pcibios_ignore_alignment_request.
>>>
>>> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
>>> ---
>>>   drivers/pci/pci.c   | 9 +++++++--
>>>   include/linux/pci.h | 1 +
>>>   2 files changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>> index 8abc843b1615..8207a09085d1 100644
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
>>>          return 0;
>>>   }
>>>
>>> +int __weak pcibios_ignore_alignment_request(void)
>>> +{
>>> +       return pci_has_flag(PCI_PROBE_ONLY);
>>> +}
>>> +
>>>   #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>>>   static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>>>   static DEFINE_SPINLOCK(resource_alignment_lock);
>>> @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
>>>          p = resource_alignment_param;
>>>          if (!*p && !align)
>>>                  goto out;
>>> -       if (pci_has_flag(PCI_PROBE_ONLY)) {
>>> +       if (pcibios_ignore_alignment_request()) {
>>>                  align = 0;
>>> -               pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n");
>>> +               pr_info_once("PCI: Ignoring requested alignments\n");
>>>                  goto out;
>>>          }
>>
>> I think the logic here is questionable to begin with. If the user has
>> explicitly requested re-aligning a resource via the command line then
>> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
>> they get to keep the pieces.
>>
>> That said, the real issue here is that PCI_PROBE_ONLY probably
>> shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
>> hotplugged devices are configured by firmware before it's passed to
>> the guest and we need to keep the FW assignments otherwise things
>> break. QEMU however doesn't do any BAR assignments and relies on that
>> being handled by the guest. At boot time this is done by SLOF, but
>> Linux only keeps SLOF around until it's extracted the device-tree.
>> Once that's done SLOF gets blown away and the kernel needs to do it's
>> own BAR assignments. I'm guessing there's a hack in there to make it
>> work today, but it's a little surprising that it works at all...
>>
>> IIRC Sam Bobroff was looking at hotplug under pseries recently so he
>> might have something to add. He's sick at the moment, but I'll ask him
>> to take a look at this once he's back among the living
> 
> There seems to be some code already in the kernel that will disable
> PCI_PROBE_ONLY based on a device tree property, so I did a quick test
> today and it seems to work. Only a trivial tweak is needed in QEMU to
> do it (have spapr_dt_chosen() add a node called "linux,pci-probe-only"
> with a value of 0), and that would allow us to set it only for QEMU (and
> not PowerVM) if that's what we want to do. Is that useful?
> 
> (I haven't done any real testing yet but the guest booted up OK.)

It was my understanding that PCI_PROBE_ONLY should actually be set
initially so that Linux uses SLOF's BAR assignments. The issue here
is that PCI_PROBE_ONLY shouldn't be honored after initial bringup
on KVM so that hotplugged PCI devices can have custom BAR alignments.

Of course, if there's no need to honor SLOF's initial assignments,
I assume disabling PCI_PROBE_ONLY would work fine. In fact, I'm
not entirely sure why it's done in the first place. Does anybody
know?

If there is actually a valid reason for preserving SLOF's initial
assignments, then it seems like the correct solution is to disable
PCI_PROBE_ONLY after initial PCI bringup or ignore it in
pci_specified_resource_alignment() like I do in this patch set.

Bjorn Helgaas also suggested marking individual resources provided
by SLOF/PHYP with IORESOURCE_PCI_FIXED which would remove the need
to use PCI_PROBE_ONLY altogether.

Any thoughts?

- Shawn

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
  2019-05-30  3:39         ` Alexey Kardashevskiy
@ 2019-05-30 22:49           ` Shawn Anastasio
  2019-05-31  3:56             ` Alexey Kardashevskiy
  0 siblings, 1 reply; 19+ messages in thread
From: Shawn Anastasio @ 2019-05-30 22:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Oliver
  Cc: Sam Bobroff, linux-pci, Linux Kernel Mailing List, rppt,
	Paul Mackerras, Bjorn Helgaas, xyjxie, linuxppc-dev

On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote:
> 
> 
> On 28/05/2019 17:39, Shawn Anastasio wrote:
>>
>>
>> On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 28/05/2019 15:36, Oliver wrote:
>>>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io>
>>>> wrote:
>>>>>
>>>>> Introduce a new pcibios function pcibios_ignore_alignment_request
>>>>> which allows the PCI core to defer to platform-specific code to
>>>>> determine whether or not to ignore alignment requests for PCI
>>>>> resources.
>>>>>
>>>>> The existing behavior is to simply ignore alignment requests when
>>>>> PCI_PROBE_ONLY is set. This is behavior is maintained by the
>>>>> default implementation of pcibios_ignore_alignment_request.
>>>>>
>>>>> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
>>>>> ---
>>>>>    drivers/pci/pci.c   | 9 +++++++--
>>>>>    include/linux/pci.h | 1 +
>>>>>    2 files changed, 8 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>> index 8abc843b1615..8207a09085d1 100644
>>>>> --- a/drivers/pci/pci.c
>>>>> +++ b/drivers/pci/pci.c
>>>>> @@ -5882,6 +5882,11 @@ resource_size_t __weak
>>>>> pcibios_default_alignment(void)
>>>>>           return 0;
>>>>>    }
>>>>>
>>>>> +int __weak pcibios_ignore_alignment_request(void)
>>>>> +{
>>>>> +       return pci_has_flag(PCI_PROBE_ONLY);
>>>>> +}
>>>>> +
>>>>>    #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>>>>>    static char
>>>>> resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>>>>>    static DEFINE_SPINLOCK(resource_alignment_lock);
>>>>> @@ -5906,9 +5911,9 @@ static resource_size_t
>>>>> pci_specified_resource_alignment(struct pci_dev *dev,
>>>>>           p = resource_alignment_param;
>>>>>           if (!*p && !align)
>>>>>                   goto out;
>>>>> -       if (pci_has_flag(PCI_PROBE_ONLY)) {
>>>>> +       if (pcibios_ignore_alignment_request()) {
>>>>>                   align = 0;
>>>>> -               pr_info_once("PCI: Ignoring requested alignments
>>>>> (PCI_PROBE_ONLY)\n");
>>>>> +               pr_info_once("PCI: Ignoring requested alignments\n");
>>>>>                   goto out;
>>>>>           }
>>>>
>>>> I think the logic here is questionable to begin with. If the user has
>>>> explicitly requested re-aligning a resource via the command line then
>>>> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
>>>> they get to keep the pieces.
>>>>
>>>> That said, the real issue here is that PCI_PROBE_ONLY probably
>>>> shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
>>>> hotplugged devices are configured by firmware before it's passed to
>>>> the guest and we need to keep the FW assignments otherwise things
>>>> break. QEMU however doesn't do any BAR assignments and relies on that
>>>> being handled by the guest. At boot time this is done by SLOF, but
>>>> Linux only keeps SLOF around until it's extracted the device-tree.
>>>> Once that's done SLOF gets blown away and the kernel needs to do it's
>>>> own BAR assignments. I'm guessing there's a hack in there to make it
>>>> work today, but it's a little surprising that it works at all...
>>>
>>>
>>> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
>>> guest which receives an event from qemu (RAS_EPOW from
>>> /proc/interrupts), fetches device tree chunks (and as I understand it -
>>> they come with BARs from phyp but without from qemu) and writes "1" to
>>> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:
>>
>> Interesting. Does this mean that the PHYP hotplug path doesn't
>> call pci_assign_resource?
> 
> 
> I'd expect dlpar_add_slot() to be called under phyp and eventually
> pci_device_add() which (I think) may or may not trigger later reassignment.
> 
> 
>> If so it means the patch may not
>> break that platform after all, though it still may not be
>> the correct way of doing things.
> 
> 
> We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
> that (unless resource_alignment= is used) the pseries guest should just
> walk through all allocated resources and leave them unchanged.

If we add a pcibios_default_alignment() implementation like was
suggested earlier, then it will behave as if the user has
specified resource_alignment= by default and SLOF's assignments
won't be honored (I think).

I guess it boils down to one question - is it important that we
observe SLOF's initial BAR assignments? If not, the device tree
modification that Sam found would work fine here. Otherwise,
we need a way to honor the initial assignments from SLOF while
still allowing custom alignments for hotplugged devices, either
by deferring to the platform code like I do here, unsetting
PCI_PROBE_ONLY in certain cases or by using IORESOURCE_PCI_FIXED
like Bjorn suggested.

> 
> 
>>> [c000000006e6f960] [c0000000005f62d4] pci_assign_resource+0x44/0x360
>>>
>>> [c000000006e6fa10] [c0000000005f8b54]
>>> assign_requested_resources_sorted+0x84/0x110
>>> [c000000006e6fa60] [c0000000005f9540]
>>> __assign_resources_sorted+0xd0/0x750
>>> [c000000006e6fb40] [c0000000005fb2e0]
>>> __pci_bus_assign_resources+0x80/0x280
>>> [c000000006e6fc00] [c0000000005fb95c]
>>> pci_assign_unassigned_bus_resources+0xbc/0x100
>>> [c000000006e6fc60] [c0000000005e3d74] pci_rescan_bus+0x34/0x60
>>>
>>> [c000000006e6fc90] [c0000000005f1ef4] rescan_store+0x84/0xc0
>>>
>>> [c000000006e6fcd0] [c00000000068060c] bus_attr_store+0x3c/0x60
>>>
>>> [c000000006e6fcf0] [c00000000037853c] sysfs_kf_write+0x5c/0x80
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> IIRC Sam Bobroff was looking at hotplug under pseries recently so he
>>>> might have something to add. He's sick at the moment, but I'll ask him
>>>> to take a look at this once he's back among the living
>>>>
>>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>>> index 4a5a84d7bdd4..47471dcdbaf9 100644
>>>>> --- a/include/linux/pci.h
>>>>> +++ b/include/linux/pci.h
>>>>> @@ -1990,6 +1990,7 @@ static inline void
>>>>> pcibios_penalize_isa_irq(int irq, int active) {}
>>>>>    int pcibios_alloc_irq(struct pci_dev *dev);
>>>>>    void pcibios_free_irq(struct pci_dev *dev);
>>>>>    resource_size_t pcibios_default_alignment(void);
>>>>> +int pcibios_ignore_alignment_request(void);
>>>>>
>>>>>    #ifdef CONFIG_HIBERNATE_CALLBACKS
>>>>>    extern struct dev_pm_ops pcibios_pm_ops;
>>>>> -- 
>>>>> 2.20.1
>>>>>
>>>
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
  2019-05-30 22:49           ` Shawn Anastasio
@ 2019-05-31  3:56             ` Alexey Kardashevskiy
  2019-06-03  2:23               ` Shawn Anastasio
  0 siblings, 1 reply; 19+ messages in thread
From: Alexey Kardashevskiy @ 2019-05-31  3:56 UTC (permalink / raw)
  To: Shawn Anastasio, Oliver
  Cc: Sam Bobroff, linux-pci, Linux Kernel Mailing List, rppt,
	Paul Mackerras, Bjorn Helgaas, xyjxie, linuxppc-dev



On 31/05/2019 08:49, Shawn Anastasio wrote:
> On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote:
>>
>>
>> On 28/05/2019 17:39, Shawn Anastasio wrote:
>>>
>>>
>>> On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 28/05/2019 15:36, Oliver wrote:
>>>>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io>
>>>>> wrote:
>>>>>>
>>>>>> Introduce a new pcibios function pcibios_ignore_alignment_request
>>>>>> which allows the PCI core to defer to platform-specific code to
>>>>>> determine whether or not to ignore alignment requests for PCI
>>>>>> resources.
>>>>>>
>>>>>> The existing behavior is to simply ignore alignment requests when
>>>>>> PCI_PROBE_ONLY is set. This is behavior is maintained by the
>>>>>> default implementation of pcibios_ignore_alignment_request.
>>>>>>
>>>>>> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
>>>>>> ---
>>>>>>    drivers/pci/pci.c   | 9 +++++++--
>>>>>>    include/linux/pci.h | 1 +
>>>>>>    2 files changed, 8 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>> index 8abc843b1615..8207a09085d1 100644
>>>>>> --- a/drivers/pci/pci.c
>>>>>> +++ b/drivers/pci/pci.c
>>>>>> @@ -5882,6 +5882,11 @@ resource_size_t __weak
>>>>>> pcibios_default_alignment(void)
>>>>>>           return 0;
>>>>>>    }
>>>>>>
>>>>>> +int __weak pcibios_ignore_alignment_request(void)
>>>>>> +{
>>>>>> +       return pci_has_flag(PCI_PROBE_ONLY);
>>>>>> +}
>>>>>> +
>>>>>>    #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>>>>>>    static char
>>>>>> resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>>>>>>    static DEFINE_SPINLOCK(resource_alignment_lock);
>>>>>> @@ -5906,9 +5911,9 @@ static resource_size_t
>>>>>> pci_specified_resource_alignment(struct pci_dev *dev,
>>>>>>           p = resource_alignment_param;
>>>>>>           if (!*p && !align)
>>>>>>                   goto out;
>>>>>> -       if (pci_has_flag(PCI_PROBE_ONLY)) {
>>>>>> +       if (pcibios_ignore_alignment_request()) {
>>>>>>                   align = 0;
>>>>>> -               pr_info_once("PCI: Ignoring requested alignments
>>>>>> (PCI_PROBE_ONLY)\n");
>>>>>> +               pr_info_once("PCI: Ignoring requested alignments\n");
>>>>>>                   goto out;
>>>>>>           }
>>>>>
>>>>> I think the logic here is questionable to begin with. If the user has
>>>>> explicitly requested re-aligning a resource via the command line then
>>>>> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
>>>>> they get to keep the pieces.
>>>>>
>>>>> That said, the real issue here is that PCI_PROBE_ONLY probably
>>>>> shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
>>>>> hotplugged devices are configured by firmware before it's passed to
>>>>> the guest and we need to keep the FW assignments otherwise things
>>>>> break. QEMU however doesn't do any BAR assignments and relies on that
>>>>> being handled by the guest. At boot time this is done by SLOF, but
>>>>> Linux only keeps SLOF around until it's extracted the device-tree.
>>>>> Once that's done SLOF gets blown away and the kernel needs to do it's
>>>>> own BAR assignments. I'm guessing there's a hack in there to make it
>>>>> work today, but it's a little surprising that it works at all...
>>>>
>>>>
>>>> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
>>>> guest which receives an event from qemu (RAS_EPOW from
>>>> /proc/interrupts), fetches device tree chunks (and as I understand it -
>>>> they come with BARs from phyp but without from qemu) and writes "1" to
>>>> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:
>>>
>>> Interesting. Does this mean that the PHYP hotplug path doesn't
>>> call pci_assign_resource?
>>
>>
>> I'd expect dlpar_add_slot() to be called under phyp and eventually
>> pci_device_add() which (I think) may or may not trigger later
>> reassignment.
>>
>>
>>> If so it means the patch may not
>>> break that platform after all, though it still may not be
>>> the correct way of doing things.
>>
>>
>> We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
>> that (unless resource_alignment= is used) the pseries guest should just
>> walk through all allocated resources and leave them unchanged.
> 
> If we add a pcibios_default_alignment() implementation like was
> suggested earlier, then it will behave as if the user has
> specified resource_alignment= by default and SLOF's assignments
> won't be honored (I think).


I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and
tried booting with and without pci=resource_alignment= and I can see no
difference - BARs are still aligned to 64K as programmed in SLOF; if I
hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves
them unchanged.


> I guess it boils down to one question - is it important that we
> observe SLOF's initial BAR assignments?

It isn't if it's SLOF but it is if it's phyp. It used to not
allow/support BAR reassignment and even if it does not, I'd rather avoid
touching them.


> If not, the device tree
> modification that Sam found would work fine here. Otherwise,
> we need a way to honor the initial assignments from SLOF while
> still allowing custom alignments for hotplugged devices, either
> by deferring to the platform code like I do here, unsetting
> PCI_PROBE_ONLY in certain cases or by using IORESOURCE_PCI_FIXED
> like Bjorn suggested.
> 
>>
>>
>>>> [c000000006e6f960] [c0000000005f62d4] pci_assign_resource+0x44/0x360
>>>>
>>>> [c000000006e6fa10] [c0000000005f8b54]
>>>> assign_requested_resources_sorted+0x84/0x110
>>>> [c000000006e6fa60] [c0000000005f9540]
>>>> __assign_resources_sorted+0xd0/0x750
>>>> [c000000006e6fb40] [c0000000005fb2e0]
>>>> __pci_bus_assign_resources+0x80/0x280
>>>> [c000000006e6fc00] [c0000000005fb95c]
>>>> pci_assign_unassigned_bus_resources+0xbc/0x100
>>>> [c000000006e6fc60] [c0000000005e3d74] pci_rescan_bus+0x34/0x60
>>>>
>>>> [c000000006e6fc90] [c0000000005f1ef4] rescan_store+0x84/0xc0
>>>>
>>>> [c000000006e6fcd0] [c00000000068060c] bus_attr_store+0x3c/0x60
>>>>
>>>> [c000000006e6fcf0] [c00000000037853c] sysfs_kf_write+0x5c/0x80
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> IIRC Sam Bobroff was looking at hotplug under pseries recently so he
>>>>> might have something to add. He's sick at the moment, but I'll ask him
>>>>> to take a look at this once he's back among the living
>>>>>
>>>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>>>> index 4a5a84d7bdd4..47471dcdbaf9 100644
>>>>>> --- a/include/linux/pci.h
>>>>>> +++ b/include/linux/pci.h
>>>>>> @@ -1990,6 +1990,7 @@ static inline void
>>>>>> pcibios_penalize_isa_irq(int irq, int active) {}
>>>>>>    int pcibios_alloc_irq(struct pci_dev *dev);
>>>>>>    void pcibios_free_irq(struct pci_dev *dev);
>>>>>>    resource_size_t pcibios_default_alignment(void);
>>>>>> +int pcibios_ignore_alignment_request(void);
>>>>>>
>>>>>>    #ifdef CONFIG_HIBERNATE_CALLBACKS
>>>>>>    extern struct dev_pm_ops pcibios_pm_ops;
>>>>>> -- 
>>>>>> 2.20.1
>>>>>>
>>>>
>>

-- 
Alexey

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
  2019-05-31  3:56             ` Alexey Kardashevskiy
@ 2019-06-03  2:23               ` Shawn Anastasio
  2019-06-03  5:02                 ` Alexey Kardashevskiy
  0 siblings, 1 reply; 19+ messages in thread
From: Shawn Anastasio @ 2019-06-03  2:23 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Oliver
  Cc: Sam Bobroff, linux-pci, Linux Kernel Mailing List, rppt,
	Paul Mackerras, Bjorn Helgaas, xyjxie, linuxppc-dev



On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote:
> 
> 
> On 31/05/2019 08:49, Shawn Anastasio wrote:
>> On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 28/05/2019 17:39, Shawn Anastasio wrote:
>>>>
>>>>
>>>> On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:
>>>>>
>>>>>
>>>>> On 28/05/2019 15:36, Oliver wrote:
>>>>>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io>
>>>>>> wrote:
>>>>>>>
>>>>>>> Introduce a new pcibios function pcibios_ignore_alignment_request
>>>>>>> which allows the PCI core to defer to platform-specific code to
>>>>>>> determine whether or not to ignore alignment requests for PCI
>>>>>>> resources.
>>>>>>>
>>>>>>> The existing behavior is to simply ignore alignment requests when
>>>>>>> PCI_PROBE_ONLY is set. This is behavior is maintained by the
>>>>>>> default implementation of pcibios_ignore_alignment_request.
>>>>>>>
>>>>>>> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
>>>>>>> ---
>>>>>>>     drivers/pci/pci.c   | 9 +++++++--
>>>>>>>     include/linux/pci.h | 1 +
>>>>>>>     2 files changed, 8 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>>> index 8abc843b1615..8207a09085d1 100644
>>>>>>> --- a/drivers/pci/pci.c
>>>>>>> +++ b/drivers/pci/pci.c
>>>>>>> @@ -5882,6 +5882,11 @@ resource_size_t __weak
>>>>>>> pcibios_default_alignment(void)
>>>>>>>            return 0;
>>>>>>>     }
>>>>>>>
>>>>>>> +int __weak pcibios_ignore_alignment_request(void)
>>>>>>> +{
>>>>>>> +       return pci_has_flag(PCI_PROBE_ONLY);
>>>>>>> +}
>>>>>>> +
>>>>>>>     #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>>>>>>>     static char
>>>>>>> resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>>>>>>>     static DEFINE_SPINLOCK(resource_alignment_lock);
>>>>>>> @@ -5906,9 +5911,9 @@ static resource_size_t
>>>>>>> pci_specified_resource_alignment(struct pci_dev *dev,
>>>>>>>            p = resource_alignment_param;
>>>>>>>            if (!*p && !align)
>>>>>>>                    goto out;
>>>>>>> -       if (pci_has_flag(PCI_PROBE_ONLY)) {
>>>>>>> +       if (pcibios_ignore_alignment_request()) {
>>>>>>>                    align = 0;
>>>>>>> -               pr_info_once("PCI: Ignoring requested alignments
>>>>>>> (PCI_PROBE_ONLY)\n");
>>>>>>> +               pr_info_once("PCI: Ignoring requested alignments\n");
>>>>>>>                    goto out;
>>>>>>>            }
>>>>>>
>>>>>> I think the logic here is questionable to begin with. If the user has
>>>>>> explicitly requested re-aligning a resource via the command line then
>>>>>> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
>>>>>> they get to keep the pieces.
>>>>>>
>>>>>> That said, the real issue here is that PCI_PROBE_ONLY probably
>>>>>> shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
>>>>>> hotplugged devices are configured by firmware before it's passed to
>>>>>> the guest and we need to keep the FW assignments otherwise things
>>>>>> break. QEMU however doesn't do any BAR assignments and relies on that
>>>>>> being handled by the guest. At boot time this is done by SLOF, but
>>>>>> Linux only keeps SLOF around until it's extracted the device-tree.
>>>>>> Once that's done SLOF gets blown away and the kernel needs to do it's
>>>>>> own BAR assignments. I'm guessing there's a hack in there to make it
>>>>>> work today, but it's a little surprising that it works at all...
>>>>>
>>>>>
>>>>> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
>>>>> guest which receives an event from qemu (RAS_EPOW from
>>>>> /proc/interrupts), fetches device tree chunks (and as I understand it -
>>>>> they come with BARs from phyp but without from qemu) and writes "1" to
>>>>> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:
>>>>
>>>> Interesting. Does this mean that the PHYP hotplug path doesn't
>>>> call pci_assign_resource?
>>>
>>>
>>> I'd expect dlpar_add_slot() to be called under phyp and eventually
>>> pci_device_add() which (I think) may or may not trigger later
>>> reassignment.
>>>
>>>
>>>> If so it means the patch may not
>>>> break that platform after all, though it still may not be
>>>> the correct way of doing things.
>>>
>>>
>>> We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
>>> that (unless resource_alignment= is used) the pseries guest should just
>>> walk through all allocated resources and leave them unchanged.
>>
>> If we add a pcibios_default_alignment() implementation like was
>> suggested earlier, then it will behave as if the user has
>> specified resource_alignment= by default and SLOF's assignments
>> won't be honored (I think).
> 
> 
> I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and
> tried booting with and without pci=resource_alignment= and I can see no
> difference - BARs are still aligned to 64K as programmed in SLOF; if I
> hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves
> them unchanged.
> 
> 
>> I guess it boils down to one question - is it important that we
>> observe SLOF's initial BAR assignments?
> 
> It isn't if it's SLOF but it is if it's phyp. It used to not
> allow/support BAR reassignment and even if it does not, I'd rather avoid
> touching them.

A quick update. I tried removing pci_add_flags(PCI_PROBE_ONLY) which
worked, but if I add an implementation of pcibios_default_alignment
which simply returns PAGE_SIZE, my VM fails to boot and many errors
from the virtio disk driver are printed to the console.

After some investigation, it seems that with pcibios_default_alignment
present, Linux will reallocate all resources provided by SLOF on
boot. I'm still not sure why exactly this causes the virtio driver
to fail, but it does indicate that there is a reason to keep
SLOF's initial assignments.

Anybody have an idea what's causing this?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
  2019-06-03  2:23               ` Shawn Anastasio
@ 2019-06-03  5:02                 ` Alexey Kardashevskiy
  2019-06-03  8:35                   ` Alexey Kardashevskiy
  0 siblings, 1 reply; 19+ messages in thread
From: Alexey Kardashevskiy @ 2019-06-03  5:02 UTC (permalink / raw)
  To: Shawn Anastasio, Oliver
  Cc: Sam Bobroff, linux-pci, Linux Kernel Mailing List, rppt,
	Paul Mackerras, Bjorn Helgaas, xyjxie, linuxppc-dev



On 03/06/2019 12:23, Shawn Anastasio wrote:
> 
> 
> On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote:
>>
>>
>> On 31/05/2019 08:49, Shawn Anastasio wrote:
>>> On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 28/05/2019 17:39, Shawn Anastasio wrote:
>>>>>
>>>>>
>>>>> On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:
>>>>>>
>>>>>>
>>>>>> On 28/05/2019 15:36, Oliver wrote:
>>>>>>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Introduce a new pcibios function pcibios_ignore_alignment_request
>>>>>>>> which allows the PCI core to defer to platform-specific code to
>>>>>>>> determine whether or not to ignore alignment requests for PCI
>>>>>>>> resources.
>>>>>>>>
>>>>>>>> The existing behavior is to simply ignore alignment requests when
>>>>>>>> PCI_PROBE_ONLY is set. This is behavior is maintained by the
>>>>>>>> default implementation of pcibios_ignore_alignment_request.
>>>>>>>>
>>>>>>>> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
>>>>>>>> ---
>>>>>>>>     drivers/pci/pci.c   | 9 +++++++--
>>>>>>>>     include/linux/pci.h | 1 +
>>>>>>>>     2 files changed, 8 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>>>> index 8abc843b1615..8207a09085d1 100644
>>>>>>>> --- a/drivers/pci/pci.c
>>>>>>>> +++ b/drivers/pci/pci.c
>>>>>>>> @@ -5882,6 +5882,11 @@ resource_size_t __weak
>>>>>>>> pcibios_default_alignment(void)
>>>>>>>>            return 0;
>>>>>>>>     }
>>>>>>>>
>>>>>>>> +int __weak pcibios_ignore_alignment_request(void)
>>>>>>>> +{
>>>>>>>> +       return pci_has_flag(PCI_PROBE_ONLY);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>     #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>>>>>>>>     static char
>>>>>>>> resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>>>>>>>>     static DEFINE_SPINLOCK(resource_alignment_lock);
>>>>>>>> @@ -5906,9 +5911,9 @@ static resource_size_t
>>>>>>>> pci_specified_resource_alignment(struct pci_dev *dev,
>>>>>>>>            p = resource_alignment_param;
>>>>>>>>            if (!*p && !align)
>>>>>>>>                    goto out;
>>>>>>>> -       if (pci_has_flag(PCI_PROBE_ONLY)) {
>>>>>>>> +       if (pcibios_ignore_alignment_request()) {
>>>>>>>>                    align = 0;
>>>>>>>> -               pr_info_once("PCI: Ignoring requested alignments
>>>>>>>> (PCI_PROBE_ONLY)\n");
>>>>>>>> +               pr_info_once("PCI: Ignoring requested
>>>>>>>> alignments\n");
>>>>>>>>                    goto out;
>>>>>>>>            }
>>>>>>>
>>>>>>> I think the logic here is questionable to begin with. If the user
>>>>>>> has
>>>>>>> explicitly requested re-aligning a resource via the command line
>>>>>>> then
>>>>>>> we should probably do it even if PCI_PROBE_ONLY is set. When it
>>>>>>> breaks
>>>>>>> they get to keep the pieces.
>>>>>>>
>>>>>>> That said, the real issue here is that PCI_PROBE_ONLY probably
>>>>>>> shouldn't be set under qemu/kvm. Under the other hypervisor
>>>>>>> (PowerVM)
>>>>>>> hotplugged devices are configured by firmware before it's passed to
>>>>>>> the guest and we need to keep the FW assignments otherwise things
>>>>>>> break. QEMU however doesn't do any BAR assignments and relies on
>>>>>>> that
>>>>>>> being handled by the guest. At boot time this is done by SLOF, but
>>>>>>> Linux only keeps SLOF around until it's extracted the device-tree.
>>>>>>> Once that's done SLOF gets blown away and the kernel needs to do
>>>>>>> it's
>>>>>>> own BAR assignments. I'm guessing there's a hack in there to make it
>>>>>>> work today, but it's a little surprising that it works at all...
>>>>>>
>>>>>>
>>>>>> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
>>>>>> guest which receives an event from qemu (RAS_EPOW from
>>>>>> /proc/interrupts), fetches device tree chunks (and as I understand
>>>>>> it -
>>>>>> they come with BARs from phyp but without from qemu) and writes
>>>>>> "1" to
>>>>>> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:
>>>>>
>>>>> Interesting. Does this mean that the PHYP hotplug path doesn't
>>>>> call pci_assign_resource?
>>>>
>>>>
>>>> I'd expect dlpar_add_slot() to be called under phyp and eventually
>>>> pci_device_add() which (I think) may or may not trigger later
>>>> reassignment.
>>>>
>>>>
>>>>> If so it means the patch may not
>>>>> break that platform after all, though it still may not be
>>>>> the correct way of doing things.
>>>>
>>>>
>>>> We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
>>>> that (unless resource_alignment= is used) the pseries guest should just
>>>> walk through all allocated resources and leave them unchanged.
>>>
>>> If we add a pcibios_default_alignment() implementation like was
>>> suggested earlier, then it will behave as if the user has
>>> specified resource_alignment= by default and SLOF's assignments
>>> won't be honored (I think).
>>
>>
>> I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and
>> tried booting with and without pci=resource_alignment= and I can see no
>> difference - BARs are still aligned to 64K as programmed in SLOF; if I
>> hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves
>> them unchanged.
>>
>>
>>> I guess it boils down to one question - is it important that we
>>> observe SLOF's initial BAR assignments?
>>
>> It isn't if it's SLOF but it is if it's phyp. It used to not
>> allow/support BAR reassignment and even if it does not, I'd rather avoid
>> touching them.
> 
> A quick update. I tried removing pci_add_flags(PCI_PROBE_ONLY) which
> worked, but if I add an implementation of pcibios_default_alignment
> which simply returns PAGE_SIZE, my VM fails to boot and many errors
> from the virtio disk driver are printed to the console.
> 
> After some investigation, it seems that with pcibios_default_alignment
> present, Linux will reallocate all resources provided by SLOF on
> boot. I'm still not sure why exactly this causes the virtio driver
> to fail, but it does indicate that there is a reason to keep
> SLOF's initial assignments.
> 
> Anybody have an idea what's causing this?

With your changes the guest feels the urge to reassign bars (no idea why
but ok), when it does so, it puts both BARs (one is prefetchable) into
the 32bit non-prefetchable window of the PHB (SLOF puts the prefetchable
bar to a 64bit prefetchable window, I have no idea why the guest does it
different either but this must still work) and then qemu does not
emulate something properly - unassigned_mem_accepts() is triggered on
the bar access - no idea why - I am debugging it right now.




-- 
Alexey

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
  2019-06-03  5:02                 ` Alexey Kardashevskiy
@ 2019-06-03  8:35                   ` Alexey Kardashevskiy
  2019-06-03  9:12                     ` Shawn Anastasio
  0 siblings, 1 reply; 19+ messages in thread
From: Alexey Kardashevskiy @ 2019-06-03  8:35 UTC (permalink / raw)
  To: Shawn Anastasio
  Cc: Oliver, Sam Bobroff, linux-pci, Linux Kernel Mailing List, rppt,
	Paul Mackerras, Bjorn Helgaas, xyjxie, linuxppc-dev



On 03/06/2019 15:02, Alexey Kardashevskiy wrote:
> 
> 
> On 03/06/2019 12:23, Shawn Anastasio wrote:
>>
>>
>> On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 31/05/2019 08:49, Shawn Anastasio wrote:
>>>> On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote:
>>>>>
>>>>>
>>>>> On 28/05/2019 17:39, Shawn Anastasio wrote:
>>>>>>
>>>>>>
>>>>>> On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 28/05/2019 15:36, Oliver wrote:
>>>>>>>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Introduce a new pcibios function pcibios_ignore_alignment_request
>>>>>>>>> which allows the PCI core to defer to platform-specific code to
>>>>>>>>> determine whether or not to ignore alignment requests for PCI
>>>>>>>>> resources.
>>>>>>>>>
>>>>>>>>> The existing behavior is to simply ignore alignment requests when
>>>>>>>>> PCI_PROBE_ONLY is set. This is behavior is maintained by the
>>>>>>>>> default implementation of pcibios_ignore_alignment_request.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
>>>>>>>>> ---
>>>>>>>>>     drivers/pci/pci.c   | 9 +++++++--
>>>>>>>>>     include/linux/pci.h | 1 +
>>>>>>>>>     2 files changed, 8 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>>>>> index 8abc843b1615..8207a09085d1 100644
>>>>>>>>> --- a/drivers/pci/pci.c
>>>>>>>>> +++ b/drivers/pci/pci.c
>>>>>>>>> @@ -5882,6 +5882,11 @@ resource_size_t __weak
>>>>>>>>> pcibios_default_alignment(void)
>>>>>>>>>            return 0;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> +int __weak pcibios_ignore_alignment_request(void)
>>>>>>>>> +{
>>>>>>>>> +       return pci_has_flag(PCI_PROBE_ONLY);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>     #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>>>>>>>>>     static char
>>>>>>>>> resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>>>>>>>>>     static DEFINE_SPINLOCK(resource_alignment_lock);
>>>>>>>>> @@ -5906,9 +5911,9 @@ static resource_size_t
>>>>>>>>> pci_specified_resource_alignment(struct pci_dev *dev,
>>>>>>>>>            p = resource_alignment_param;
>>>>>>>>>            if (!*p && !align)
>>>>>>>>>                    goto out;
>>>>>>>>> -       if (pci_has_flag(PCI_PROBE_ONLY)) {
>>>>>>>>> +       if (pcibios_ignore_alignment_request()) {
>>>>>>>>>                    align = 0;
>>>>>>>>> -               pr_info_once("PCI: Ignoring requested alignments
>>>>>>>>> (PCI_PROBE_ONLY)\n");
>>>>>>>>> +               pr_info_once("PCI: Ignoring requested
>>>>>>>>> alignments\n");
>>>>>>>>>                    goto out;
>>>>>>>>>            }
>>>>>>>>
>>>>>>>> I think the logic here is questionable to begin with. If the user
>>>>>>>> has
>>>>>>>> explicitly requested re-aligning a resource via the command line
>>>>>>>> then
>>>>>>>> we should probably do it even if PCI_PROBE_ONLY is set. When it
>>>>>>>> breaks
>>>>>>>> they get to keep the pieces.
>>>>>>>>
>>>>>>>> That said, the real issue here is that PCI_PROBE_ONLY probably
>>>>>>>> shouldn't be set under qemu/kvm. Under the other hypervisor
>>>>>>>> (PowerVM)
>>>>>>>> hotplugged devices are configured by firmware before it's passed to
>>>>>>>> the guest and we need to keep the FW assignments otherwise things
>>>>>>>> break. QEMU however doesn't do any BAR assignments and relies on
>>>>>>>> that
>>>>>>>> being handled by the guest. At boot time this is done by SLOF, but
>>>>>>>> Linux only keeps SLOF around until it's extracted the device-tree.
>>>>>>>> Once that's done SLOF gets blown away and the kernel needs to do
>>>>>>>> it's
>>>>>>>> own BAR assignments. I'm guessing there's a hack in there to make it
>>>>>>>> work today, but it's a little surprising that it works at all...
>>>>>>>
>>>>>>>
>>>>>>> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
>>>>>>> guest which receives an event from qemu (RAS_EPOW from
>>>>>>> /proc/interrupts), fetches device tree chunks (and as I understand
>>>>>>> it -
>>>>>>> they come with BARs from phyp but without from qemu) and writes
>>>>>>> "1" to
>>>>>>> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:
>>>>>>
>>>>>> Interesting. Does this mean that the PHYP hotplug path doesn't
>>>>>> call pci_assign_resource?
>>>>>
>>>>>
>>>>> I'd expect dlpar_add_slot() to be called under phyp and eventually
>>>>> pci_device_add() which (I think) may or may not trigger later
>>>>> reassignment.
>>>>>
>>>>>
>>>>>> If so it means the patch may not
>>>>>> break that platform after all, though it still may not be
>>>>>> the correct way of doing things.
>>>>>
>>>>>
>>>>> We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
>>>>> that (unless resource_alignment= is used) the pseries guest should just
>>>>> walk through all allocated resources and leave them unchanged.
>>>>
>>>> If we add a pcibios_default_alignment() implementation like was
>>>> suggested earlier, then it will behave as if the user has
>>>> specified resource_alignment= by default and SLOF's assignments
>>>> won't be honored (I think).
>>>
>>>
>>> I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and
>>> tried booting with and without pci=resource_alignment= and I can see no
>>> difference - BARs are still aligned to 64K as programmed in SLOF; if I
>>> hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves
>>> them unchanged.
>>>
>>>
>>>> I guess it boils down to one question - is it important that we
>>>> observe SLOF's initial BAR assignments?
>>>
>>> It isn't if it's SLOF but it is if it's phyp. It used to not
>>> allow/support BAR reassignment and even if it does not, I'd rather avoid
>>> touching them.
>>
>> A quick update. I tried removing pci_add_flags(PCI_PROBE_ONLY) which
>> worked, but if I add an implementation of pcibios_default_alignment
>> which simply returns PAGE_SIZE, my VM fails to boot and many errors
>> from the virtio disk driver are printed to the console.
>>
>> After some investigation, it seems that with pcibios_default_alignment
>> present, Linux will reallocate all resources provided by SLOF on
>> boot. I'm still not sure why exactly this causes the virtio driver
>> to fail, but it does indicate that there is a reason to keep
>> SLOF's initial assignments.
>>
>> Anybody have an idea what's causing this?
> 
> With your changes the guest feels the urge to reassign bars (no idea why
> but ok), when it does so, it puts both BARs (one is prefetchable) into
> the 32bit non-prefetchable window of the PHB (SLOF puts the prefetchable
> bar to a 64bit prefetchable window, I have no idea why the guest does it
> different either but this must still work) and then qemu does not
> emulate something properly - unassigned_mem_accepts() is triggered on
> the bar access - no idea why - I am debugging it right now.


Sooo the problem is that resouce::flags has 2 bits to describe 64bit
BARs - PCI_BASE_ADDRESS_MEM_TYPE_64 and IORESOURCE_MEM_64 - and we don't
set IORESOURCE_MEM_64 for 64bit BARs when parsing the fdt.

So the BAR reallocator moves the BAR to 32bit window (which is not
desirable but permitted, I still have to chase it) and then
pci_std_update_resource() writes BAR back but since now it is 32bit BAR,
it does not write to the upper 32bits so that half remains 0x2100, QEMU
does not move BAR to the right window and the MMIO stops working.

Try this in the guest kernel, it seems to keep bars where they were
after slof.


diff --git a/arch/powerpc/kernel/pci_of_scan.c
b/arch/powerpc/kernel/pci_of_scan.c
index 24191ea2d9a7..64ad92016b63 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -45,6 +45,8 @@ unsigned int pci_parse_of_flags(u32 addr0, int bridge)
        if (addr0 & 0x02000000) {
                flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
                flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
+               if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
+                       flags |= IORESOURCE_MEM_64;
                flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
                if (addr0 & 0x40000000)





-- 
Alexey

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
  2019-06-03  8:35                   ` Alexey Kardashevskiy
@ 2019-06-03  9:12                     ` Shawn Anastasio
  0 siblings, 0 replies; 19+ messages in thread
From: Shawn Anastasio @ 2019-06-03  9:12 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Oliver, Sam Bobroff, linux-pci, Linux Kernel Mailing List, rppt,
	Paul Mackerras, Bjorn Helgaas, xyjxie, linuxppc-dev



On 6/3/19 3:35 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 03/06/2019 15:02, Alexey Kardashevskiy wrote:
>>
>>
>> On 03/06/2019 12:23, Shawn Anastasio wrote:
>>>
>>>
>>> On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 31/05/2019 08:49, Shawn Anastasio wrote:
>>>>> On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote:
>>>>>>
>>>>>>
>>>>>> On 28/05/2019 17:39, Shawn Anastasio wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 28/05/2019 15:36, Oliver wrote:
>>>>>>>>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Introduce a new pcibios function pcibios_ignore_alignment_request
>>>>>>>>>> which allows the PCI core to defer to platform-specific code to
>>>>>>>>>> determine whether or not to ignore alignment requests for PCI
>>>>>>>>>> resources.
>>>>>>>>>>
>>>>>>>>>> The existing behavior is to simply ignore alignment requests when
>>>>>>>>>> PCI_PROBE_ONLY is set. This is behavior is maintained by the
>>>>>>>>>> default implementation of pcibios_ignore_alignment_request.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
>>>>>>>>>> ---
>>>>>>>>>>      drivers/pci/pci.c   | 9 +++++++--
>>>>>>>>>>      include/linux/pci.h | 1 +
>>>>>>>>>>      2 files changed, 8 insertions(+), 2 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>>>>>> index 8abc843b1615..8207a09085d1 100644
>>>>>>>>>> --- a/drivers/pci/pci.c
>>>>>>>>>> +++ b/drivers/pci/pci.c
>>>>>>>>>> @@ -5882,6 +5882,11 @@ resource_size_t __weak
>>>>>>>>>> pcibios_default_alignment(void)
>>>>>>>>>>             return 0;
>>>>>>>>>>      }
>>>>>>>>>>
>>>>>>>>>> +int __weak pcibios_ignore_alignment_request(void)
>>>>>>>>>> +{
>>>>>>>>>> +       return pci_has_flag(PCI_PROBE_ONLY);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>      #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>>>>>>>>>>      static char
>>>>>>>>>> resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>>>>>>>>>>      static DEFINE_SPINLOCK(resource_alignment_lock);
>>>>>>>>>> @@ -5906,9 +5911,9 @@ static resource_size_t
>>>>>>>>>> pci_specified_resource_alignment(struct pci_dev *dev,
>>>>>>>>>>             p = resource_alignment_param;
>>>>>>>>>>             if (!*p && !align)
>>>>>>>>>>                     goto out;
>>>>>>>>>> -       if (pci_has_flag(PCI_PROBE_ONLY)) {
>>>>>>>>>> +       if (pcibios_ignore_alignment_request()) {
>>>>>>>>>>                     align = 0;
>>>>>>>>>> -               pr_info_once("PCI: Ignoring requested alignments
>>>>>>>>>> (PCI_PROBE_ONLY)\n");
>>>>>>>>>> +               pr_info_once("PCI: Ignoring requested
>>>>>>>>>> alignments\n");
>>>>>>>>>>                     goto out;
>>>>>>>>>>             }
>>>>>>>>>
>>>>>>>>> I think the logic here is questionable to begin with. If the user
>>>>>>>>> has
>>>>>>>>> explicitly requested re-aligning a resource via the command line
>>>>>>>>> then
>>>>>>>>> we should probably do it even if PCI_PROBE_ONLY is set. When it
>>>>>>>>> breaks
>>>>>>>>> they get to keep the pieces.
>>>>>>>>>
>>>>>>>>> That said, the real issue here is that PCI_PROBE_ONLY probably
>>>>>>>>> shouldn't be set under qemu/kvm. Under the other hypervisor
>>>>>>>>> (PowerVM)
>>>>>>>>> hotplugged devices are configured by firmware before it's passed to
>>>>>>>>> the guest and we need to keep the FW assignments otherwise things
>>>>>>>>> break. QEMU however doesn't do any BAR assignments and relies on
>>>>>>>>> that
>>>>>>>>> being handled by the guest. At boot time this is done by SLOF, but
>>>>>>>>> Linux only keeps SLOF around until it's extracted the device-tree.
>>>>>>>>> Once that's done SLOF gets blown away and the kernel needs to do
>>>>>>>>> it's
>>>>>>>>> own BAR assignments. I'm guessing there's a hack in there to make it
>>>>>>>>> work today, but it's a little surprising that it works at all...
>>>>>>>>
>>>>>>>>
>>>>>>>> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
>>>>>>>> guest which receives an event from qemu (RAS_EPOW from
>>>>>>>> /proc/interrupts), fetches device tree chunks (and as I understand
>>>>>>>> it -
>>>>>>>> they come with BARs from phyp but without from qemu) and writes
>>>>>>>> "1" to
>>>>>>>> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:
>>>>>>>
>>>>>>> Interesting. Does this mean that the PHYP hotplug path doesn't
>>>>>>> call pci_assign_resource?
>>>>>>
>>>>>>
>>>>>> I'd expect dlpar_add_slot() to be called under phyp and eventually
>>>>>> pci_device_add() which (I think) may or may not trigger later
>>>>>> reassignment.
>>>>>>
>>>>>>
>>>>>>> If so it means the patch may not
>>>>>>> break that platform after all, though it still may not be
>>>>>>> the correct way of doing things.
>>>>>>
>>>>>>
>>>>>> We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
>>>>>> that (unless resource_alignment= is used) the pseries guest should just
>>>>>> walk through all allocated resources and leave them unchanged.
>>>>>
>>>>> If we add a pcibios_default_alignment() implementation like was
>>>>> suggested earlier, then it will behave as if the user has
>>>>> specified resource_alignment= by default and SLOF's assignments
>>>>> won't be honored (I think).
>>>>
>>>>
>>>> I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and
>>>> tried booting with and without pci=resource_alignment= and I can see no
>>>> difference - BARs are still aligned to 64K as programmed in SLOF; if I
>>>> hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves
>>>> them unchanged.
>>>>
>>>>
>>>>> I guess it boils down to one question - is it important that we
>>>>> observe SLOF's initial BAR assignments?
>>>>
>>>> It isn't if it's SLOF but it is if it's phyp. It used to not
>>>> allow/support BAR reassignment and even if it does not, I'd rather avoid
>>>> touching them.
>>>
>>> A quick update. I tried removing pci_add_flags(PCI_PROBE_ONLY) which
>>> worked, but if I add an implementation of pcibios_default_alignment
>>> which simply returns PAGE_SIZE, my VM fails to boot and many errors
>>> from the virtio disk driver are printed to the console.
>>>
>>> After some investigation, it seems that with pcibios_default_alignment
>>> present, Linux will reallocate all resources provided by SLOF on
>>> boot. I'm still not sure why exactly this causes the virtio driver
>>> to fail, but it does indicate that there is a reason to keep
>>> SLOF's initial assignments.
>>>
>>> Anybody have an idea what's causing this?
>>
>> With your changes the guest feels the urge to reassign bars (no idea why
>> but ok), when it does so, it puts both BARs (one is prefetchable) into
>> the 32bit non-prefetchable window of the PHB (SLOF puts the prefetchable
>> bar to a 64bit prefetchable window, I have no idea why the guest does it
>> different either but this must still work) and then qemu does not
>> emulate something properly - unassigned_mem_accepts() is triggered on
>> the bar access - no idea why - I am debugging it right now.
> 
> 
> Sooo the problem is that resouce::flags has 2 bits to describe 64bit
> BARs - PCI_BASE_ADDRESS_MEM_TYPE_64 and IORESOURCE_MEM_64 - and we don't
> set IORESOURCE_MEM_64 for 64bit BARs when parsing the fdt.
> 
> So the BAR reallocator moves the BAR to 32bit window (which is not
> desirable but permitted, I still have to chase it) and then
> pci_std_update_resource() writes BAR back but since now it is 32bit BAR,
> it does not write to the upper 32bits so that half remains 0x2100, QEMU
> does not move BAR to the right window and the MMIO stops working.
> 
> Try this in the guest kernel, it seems to keep bars where they were
> after slof.

Nice debugging work! With your patch the VM does boot. I'm not sure
if SLOF's original allocations are being kept or if Linux is redoing
them (how do you check?), but MMIO works and the system boots anyways.

I've also tested hotplug and the BAR allocations are page-aligned too,
as expected.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-06-03  9:12 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-28  4:03 [PATCH v3 0/3] Allow custom PCI resource alignment on pseries Shawn Anastasio
2019-05-28  4:03 ` [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request Shawn Anastasio
2019-05-28  5:36   ` Oliver
2019-05-28  5:50     ` Shawn Anastasio
2019-05-28  6:27     ` Alexey Kardashevskiy
2019-05-28  7:39       ` Shawn Anastasio
2019-05-30  3:39         ` Alexey Kardashevskiy
2019-05-30 22:49           ` Shawn Anastasio
2019-05-31  3:56             ` Alexey Kardashevskiy
2019-06-03  2:23               ` Shawn Anastasio
2019-06-03  5:02                 ` Alexey Kardashevskiy
2019-06-03  8:35                   ` Alexey Kardashevskiy
2019-06-03  9:12                     ` Shawn Anastasio
2019-05-29 14:00     ` Bjorn Helgaas
2019-05-30  6:55     ` [EXTERNAL] " Sam Bobroff
2019-05-30 22:33       ` Shawn Anastasio
2019-05-28  4:03 ` [PATCH v3 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64 Shawn Anastasio
2019-05-28  4:03 ` [PATCH v3 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init Shawn Anastasio
2019-05-29 14:02   ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).