LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Frederic Barrat <fbarrat@linux.ibm.com>
To: Leonardo Bras <leobras.c@gmail.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Alexey Kardashevskiy <aik@ozlabs.ru>,
	David Gibson <david@gibson.dropbear.id.au>,
	Nicolin Chen <nicoleotsuka@gmail.com>,
	kernel test robot <lkp@intel.com>
Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v6 10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping
Date: Fri, 27 Aug 2021 19:06:44 +0200	[thread overview]
Message-ID: <3d79480a-20df-ea1a-e17f-8bf2c8a8a2be@linux.ibm.com> (raw)
In-Reply-To: <20210817063929.38701-11-leobras.c@gmail.com>



On 17/08/2021 08:39, Leonardo Bras wrote:
> So far it's assumed possible to map the guest RAM 1:1 to the bus, which
> works with a small number of devices. SRIOV changes it as the user can
> configure hundreds VFs and since phyp preallocates TCEs and does not
> allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
> per a PE to limit waste of physical pages.
> 
> As of today, if the assumed direct mapping is not possible, DDW creation
> is skipped and the default DMA window "ibm,dma-window" is used instead.
> 
> By using DDW, indirect mapping  can get more TCEs than available for the
> default DMA window, and also get access to using much larger pagesizes
> (16MB as implemented in qemu vs 4k from default DMA window), causing a
> significant increase on the maximum amount of memory that can be IOMMU
> mapped at the same time.
> 
> Indirect mapping will only be used if direct mapping is not a
> possibility.
> 
> For indirect mapping, it's necessary to re-create the iommu_table with
> the new DMA window parameters, so iommu_alloc() can use it.
> 
> Removing the default DMA window for using DDW with indirect mapping
> is only allowed if there is no current IOMMU memory allocated in
> the iommu_table. enable_ddw() is aborted otherwise.
> 
> Even though there won't be both direct and indirect mappings at the
> same time, we can't reuse the DIRECT64_PROPNAME property name, or else
> an older kexec()ed kernel can assume direct mapping, and skip
> iommu_alloc(), causing undesirable behavior.
> So a new property name DMA64_PROPNAME "linux,dma64-ddr-window-info"
> was created to represent a DDW that does not allow direct mapping.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---


I think it looks ok now as it was mostly me who was misunderstanding one 
part of the previous iteration.
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>

Sorry for the late review, I was enjoying some time off. And thanks for 
that series, I believe it should help with those bugs complaining about 
lack of DMA space. It was also very educational for me, thanks to you 
and Alexey for your detailed answers.

   Fred


>   arch/powerpc/platforms/pseries/iommu.c | 89 +++++++++++++++++++++-----
>   1 file changed, 74 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index e11c00b2dc1e..0eccc29f5573 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -375,6 +375,7 @@ static DEFINE_SPINLOCK(direct_window_list_lock);
>   /* protects initializing window twice for same device */
>   static DEFINE_MUTEX(direct_window_init_mutex);
>   #define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
> +#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
>   
>   static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
>   					unsigned long num_pfn, const void *arg)
> @@ -940,6 +941,7 @@ static int find_existing_ddw_windows(void)
>   		return 0;
>   
>   	find_existing_ddw_windows_named(DIRECT64_PROPNAME);
> +	find_existing_ddw_windows_named(DMA64_PROPNAME);
>   
>   	return 0;
>   }
> @@ -1226,14 +1228,17 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	struct ddw_create_response create;
>   	int page_shift;
>   	u64 win_addr;
> +	const char *win_name;
>   	struct device_node *dn;
>   	u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   	struct direct_window *window;
>   	struct property *win64;
>   	bool ddw_enabled = false;
>   	struct failed_ddw_pdn *fpdn;
> -	bool default_win_removed = false;
> +	bool default_win_removed = false, direct_mapping = false;
>   	bool pmem_present;
> +	struct pci_dn *pci = PCI_DN(pdn);
> +	struct iommu_table *tbl = pci->table_group->tables[0];
>   
>   	dn = of_find_node_by_type(NULL, "ibm,pmemory");
>   	pmem_present = dn != NULL;
> @@ -1242,6 +1247,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	mutex_lock(&direct_window_init_mutex);
>   
>   	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &len)) {
> +		direct_mapping = (len >= max_ram_len);
>   		ddw_enabled = true;
>   		goto out_unlock;
>   	}
> @@ -1322,8 +1328,8 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   			  query.page_size);
>   		goto out_failed;
>   	}
> -	/* verify the window * number of ptes will map the partition */
> -	/* check largest block * page size > max memory hotplug addr */
> +
> +
>   	/*
>   	 * The "ibm,pmemory" can appear anywhere in the address space.
>   	 * Assuming it is still backed by page structs, try MAX_PHYSMEM_BITS
> @@ -1339,13 +1345,25 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   			dev_info(&dev->dev, "Skipping ibm,pmemory");
>   	}
>   
> +	/* check if the available block * number of ptes will map everything */
>   	if (query.largest_available_block < (1ULL << (len - page_shift))) {
>   		dev_dbg(&dev->dev,
>   			"can't map partition max 0x%llx with %llu %llu-sized pages\n",
>   			1ULL << len,
>   			query.largest_available_block,
>   			1ULL << page_shift);
> -		goto out_failed;
> +
> +		/* DDW + IOMMU on single window may fail if there is any allocation */
> +		if (default_win_removed && iommu_table_in_use(tbl)) {
> +			dev_dbg(&dev->dev, "current IOMMU table in use, can't be replaced.\n");
> +			goto out_failed;
> +		}
> +
> +		len = order_base_2(query.largest_available_block << page_shift);
> +		win_name = DMA64_PROPNAME;
> +	} else {
> +		direct_mapping = true;
> +		win_name = DIRECT64_PROPNAME;
>   	}
>   
>   	ret = create_ddw(dev, ddw_avail, &create, page_shift, len);
> @@ -1356,8 +1374,8 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		  create.liobn, dn);
>   
>   	win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
> -	win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, win_addr,
> -				    page_shift, len);
> +	win64 = ddw_property_create(win_name, create.liobn, win_addr, page_shift, len);
> +
>   	if (!win64) {
>   		dev_info(&dev->dev,
>   			 "couldn't allocate property, property name, or value\n");
> @@ -1375,15 +1393,54 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	if (!window)
>   		goto out_del_prop;
>   
> -	ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
> -			win64->value, tce_setrange_multi_pSeriesLP_walk);
> -	if (ret) {
> -		dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
> -			 dn, ret);
> +	if (direct_mapping) {
> +		/* DDW maps the whole partition, so enable direct DMA mapping */
> +		ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
> +					    win64->value, tce_setrange_multi_pSeriesLP_walk);
> +		if (ret) {
> +			dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
> +				 dn, ret);
>   
>   		/* Make sure to clean DDW if any TCE was set*/
>   		clean_dma_window(pdn, win64->value);
> -		goto out_del_list;
> +			goto out_del_list;
> +		}
> +	} else {
> +		struct iommu_table *newtbl;
> +		int i;
> +
> +		for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources); i++) {
> +			const unsigned long mask = IORESOURCE_MEM_64 | IORESOURCE_MEM;
> +
> +			/* Look for MMIO32 */
> +			if ((pci->phb->mem_resources[i].flags & mask) == IORESOURCE_MEM)
> +				break;
> +		}
> +
> +		if (i == ARRAY_SIZE(pci->phb->mem_resources))
> +			goto out_del_list;
> +
> +		/* New table for using DDW instead of the default DMA window */
> +		newtbl = iommu_pseries_alloc_table(pci->phb->node);
> +		if (!newtbl) {
> +			dev_dbg(&dev->dev, "couldn't create new IOMMU table\n");
> +			goto out_del_list;
> +		}
> +
> +		iommu_table_setparms_common(newtbl, pci->phb->bus->number, create.liobn, win_addr,
> +					    1UL << len, page_shift, NULL, &iommu_table_lpar_multi_ops);
> +		iommu_init_table(newtbl, pci->phb->node, pci->phb->mem_resources[i].start,
> +				 pci->phb->mem_resources[i].end);
> +
> +		pci->table_group->tables[1] = newtbl;
> +
> +		/* Keep default DMA window stuct if removed */
> +		if (default_win_removed) {
> +			tbl->it_size = 0;
> +			kfree(tbl->it_map);
> +		}
> +
> +		set_iommu_table_base(&dev->dev, newtbl);
>   	}
>   
>   	spin_lock(&direct_window_list_lock);
> @@ -1427,10 +1484,10 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	 * as RAM, then we failed to create a window to cover persistent
>   	 * memory and need to set the DMA limit.
>   	 */
> -	if (pmem_present && ddw_enabled && (len == max_ram_len))
> +	if (pmem_present && ddw_enabled && direct_mapping && len == max_ram_len)
>   		dev->dev.bus_dma_limit = dev->dev.archdata.dma_offset + (1ULL << len);
>   
> -	return ddw_enabled;
> +    return ddw_enabled && direct_mapping;
>   }
>   
>   static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
> @@ -1572,7 +1629,9 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
>   		 * we have to remove the property when releasing
>   		 * the device node.
>   		 */
> -		remove_ddw(np, false, DIRECT64_PROPNAME);
> +		if (remove_ddw(np, false, DIRECT64_PROPNAME))
> +			remove_ddw(np, false, DMA64_PROPNAME);
> +
>   		if (pci && pci->table_group)
>   			iommu_pseries_free_group(pci->table_group,
>   					np->full_name);
> 

  reply	other threads:[~2021-08-27 17:07 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-17  6:39 [PATCH v6 00/11] DDW + Indirect Mapping Leonardo Bras
2021-08-17  6:39 ` [PATCH v6 01/11] powerpc/pseries/iommu: Replace hard-coded page shift Leonardo Bras
2021-08-17  6:39 ` [PATCH v6 02/11] powerpc/kernel/iommu: Add new iommu_table_in_use() helper Leonardo Bras
2021-08-27 16:52   ` Frederic Barrat
2021-08-17  6:39 ` [PATCH v6 03/11] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper Leonardo Bras
2021-08-17  6:39 ` [PATCH v6 04/11] powerpc/pseries/iommu: Add ddw_list_new_entry() helper Leonardo Bras
2021-08-17  6:39 ` [PATCH v6 05/11] powerpc/pseries/iommu: Allow DDW windows starting at 0x00 Leonardo Bras
2021-08-17  6:39 ` [PATCH v6 06/11] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw() Leonardo Bras
2021-08-27 16:55   ` Frederic Barrat
2021-08-17  6:39 ` [PATCH v6 07/11] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper Leonardo Bras
2021-08-17  6:39 ` [PATCH v6 08/11] powerpc/pseries/iommu: Update remove_dma_window() to accept property name Leonardo Bras
2021-08-27 16:58   ` Frederic Barrat
2021-08-17  6:39 ` [PATCH v6 09/11] powerpc/pseries/iommu: Find existing DDW with given " Leonardo Bras
2021-08-17  6:39 ` [PATCH v6 10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping Leonardo Bras
2021-08-27 17:06   ` Frederic Barrat [this message]
2021-08-27 17:53     ` Leonardo Brás
2021-08-17  6:39 ` [PATCH v6 11/11] powerpc/pseries/iommu: Rename "direct window" to "dma window" Leonardo Bras
2021-08-31 13:56 ` [PATCH v6 00/11] DDW + Indirect Mapping Michael Ellerman
     [not found] ` <82ca56ab-6a0a-7cbb-a5e7-facc40f35c3c@linux.vnet.ibm.com>
2021-08-31 20:18   ` Leonardo Brás
2021-08-31 20:39     ` David Christensen
2021-08-31 20:49       ` Leonardo Brás

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d79480a-20df-ea1a-e17f-8bf2c8a8a2be@linux.ibm.com \
    --to=fbarrat@linux.ibm.com \
    --cc=aik@ozlabs.ru \
    --cc=benh@kernel.crashing.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=leobras.c@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lkp@intel.com \
    --cc=mpe@ellerman.id.au \
    --cc=nicoleotsuka@gmail.com \
    --cc=paulus@samba.org \
    --subject='Re: [PATCH v6 10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).