LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [PATCH 0/6] MSI portability cleanups [not found] <1169714047.65693.647693675533.qpush@cradle> @ 2007-01-28 19:40 ` Eric W. Biederman 2007-01-28 19:42 ` [PATCH 1/6] msi: Kill msi_lookup_irq Eric W. Biederman 2007-01-28 20:23 ` [PATCH 0/6] MSI portability cleanups Benjamin Herrenschmidt 0 siblings, 2 replies; 35+ messages in thread From: Eric W. Biederman @ 2007-01-28 19:40 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: linux-pci, David S. Miller, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, Michael Ellerman, Grant Grundler, Tony Luck, linux-kernel, Ingo Molnar This patchset is against gregkh-pci but except for the context around msi_lookup_irq being completely different it applies cleanly to 2.6.20-rc6 as well. When I first looked at this problem I thought no big deal it will one or two simple patches and that is it. When I looked more closely I discovered that to be certain of not introducing bugs I would have to kill msi_lock, which made the problem a little more difficult. The result of this patchset is that architecture hooks become: int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc); void arch_teardown_msi_irq(unsigned int irq); and are responsible for allocating and freeing the irq as well as setting it up. This touches the architecture code for i386, x86_64, and ia64 to accomplish this. Since I couldn't test ia64 I reviewed the code closely, and compile tested it. The other big change is that I added a field to irq_desc to point at the msi_desc. This removes the conflicts with the existing pointer fields and makes the irq -> msi_desc mapping useable outside of msi.c The only architecture problem that isn't solvable in this context is the problem of supporting the crazy hypervisor on the ppc RTAS, which asks us to drive the hardware but does not give us access to the hardware registers. Eric ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 1/6] msi: Kill msi_lookup_irq 2007-01-28 19:40 ` [PATCH 0/6] MSI portability cleanups Eric W. Biederman @ 2007-01-28 19:42 ` Eric W. Biederman 2007-01-28 19:44 ` [PATCH 2/6] msi: Remove msi_lock Eric W. Biederman 2007-01-28 22:01 ` [PATCH 1/6] msi: Kill msi_lookup_irq Paul Mackerras 2007-01-28 20:23 ` [PATCH 0/6] MSI portability cleanups Benjamin Herrenschmidt 1 sibling, 2 replies; 35+ messages in thread From: Eric W. Biederman @ 2007-01-28 19:42 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: linux-pci, David S. Miller, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, Michael Ellerman, Grant Grundler, Tony Luck, linux-kernel, Ingo Molnar The function msi_lookup_irq was horrible. As a side effect of running it changed dev->irq, and then the callers would need to change it back. In addition it does a global scan through all of the irqs, which seems to be the sole justification of the msi_lock. To remove the neede for msi_lookup_irq I added first_msi_irq to struct pci_dev. Then depending on the context I replaced msi_lookup_irq with dev->first_msi_irq, dev->msi_enabled, or dev->msix_enabled. msi_enabled and msix_enabled were already present in pci_dev for other reasons. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- drivers/pci/msi.c | 149 ++++++++++++++++++++------------------------------- include/linux/pci.h | 3 + 2 files changed, 62 insertions(+), 90 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index bca5a8a..71080c9 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -283,28 +283,6 @@ void pci_disable_device_msi(struct pci_dev *dev) PCI_CAP_ID_MSIX); } -static int msi_lookup_irq(struct pci_dev *dev, int type) -{ - int irq; - unsigned long flags; - - spin_lock_irqsave(&msi_lock, flags); - for (irq = 0; irq < NR_IRQS; irq++) { - if (!msi_desc[irq] || msi_desc[irq]->dev != dev || - msi_desc[irq]->msi_attrib.type != type || - msi_desc[irq]->msi_attrib.default_irq != dev->irq) - continue; - spin_unlock_irqrestore(&msi_lock, flags); - /* This pre-assigned MSI irq for this device - already exists. Override dev->irq with this irq */ - dev->irq = irq; - return 0; - } - spin_unlock_irqrestore(&msi_lock, flags); - - return -EACCES; -} - #ifdef CONFIG_PM static int __pci_save_msi_state(struct pci_dev *dev) { @@ -375,11 +353,13 @@ static void __pci_restore_msi_state(struct pci_dev *dev) static int __pci_save_msix_state(struct pci_dev *dev) { int pos; - int temp; int irq, head, tail = 0; u16 control; struct pci_cap_saved_state *save_state; + if (!dev->msix_enabled) + return 0; + pos = pci_find_capability(dev, PCI_CAP_ID_MSIX); if (pos <= 0 || dev->no_msi) return 0; @@ -397,13 +377,7 @@ static int __pci_save_msix_state(struct pci_dev *dev) *((u16 *)&save_state->data[0]) = control; /* save the table */ - temp = dev->irq; - if (msi_lookup_irq(dev, PCI_CAP_ID_MSIX)) { - kfree(save_state); - return -EINVAL; - } - - irq = head = dev->irq; + irq = head = dev->first_msi_irq; while (head != tail) { struct msi_desc *entry; @@ -413,7 +387,6 @@ static int __pci_save_msix_state(struct pci_dev *dev) tail = msi_desc[irq]->link.tail; irq = tail; } - dev->irq = temp; save_state->cap_nr = PCI_CAP_ID_MSIX; pci_add_saved_cap(dev, save_state); @@ -439,9 +412,11 @@ static void __pci_restore_msix_state(struct pci_dev *dev) int pos; int irq, head, tail = 0; struct msi_desc *entry; - int temp; struct pci_cap_saved_state *save_state; + if (!dev->msix_enabled) + return; + save_state = pci_find_saved_cap(dev, PCI_CAP_ID_MSIX); if (!save_state) return; @@ -454,10 +429,7 @@ static void __pci_restore_msix_state(struct pci_dev *dev) return; /* route the table */ - temp = dev->irq; - if (msi_lookup_irq(dev, PCI_CAP_ID_MSIX)) - return; - irq = head = dev->irq; + irq = head = dev->first_msi_irq; while (head != tail) { entry = msi_desc[irq]; write_msi_msg(irq, &entry->msg_save); @@ -465,7 +437,6 @@ static void __pci_restore_msix_state(struct pci_dev *dev) tail = msi_desc[irq]->link.tail; irq = tail; } - dev->irq = temp; pci_write_config_word(dev, msi_control_reg(pos), save); enable_msi_mode(dev, pos, PCI_CAP_ID_MSIX); @@ -535,6 +506,7 @@ static int msi_capability_init(struct pci_dev *dev) return status; } + dev->first_msi_irq = irq; attach_msi_entry(entry, irq); /* Set MSI enabled bits */ enable_msi_mode(dev, pos, PCI_CAP_ID_MSI); @@ -631,6 +603,7 @@ static int msix_capability_init(struct pci_dev *dev, avail = -EBUSY; return avail; } + dev->first_msi_irq = entries[0].vector; /* Set MSI-X enabled bits */ enable_msi_mode(dev, pos, PCI_CAP_ID_MSIX); @@ -678,13 +651,11 @@ int pci_msi_supported(struct pci_dev * dev) **/ int pci_enable_msi(struct pci_dev* dev) { - int pos, temp, status; + int pos, status; if (pci_msi_supported(dev) < 0) return -EINVAL; - temp = dev->irq; - status = msi_init(); if (status < 0) return status; @@ -693,15 +664,14 @@ int pci_enable_msi(struct pci_dev* dev) if (!pos) return -EINVAL; - WARN_ON(!msi_lookup_irq(dev, PCI_CAP_ID_MSI)); + WARN_ON(!!dev->msi_enabled); /* Check whether driver already requested for MSI-X irqs */ pos = pci_find_capability(dev, PCI_CAP_ID_MSIX); - if (pos > 0 && !msi_lookup_irq(dev, PCI_CAP_ID_MSIX)) { + if (pos > 0 && dev->msix_enabled) { printk(KERN_INFO "PCI: %s: Can't enable MSI. " - "Device already has MSI-X irq assigned\n", + "Device already has MSI-X enabled\n", pci_name(dev)); - dev->irq = temp; return -EINVAL; } status = msi_capability_init(dev); @@ -720,6 +690,9 @@ void pci_disable_msi(struct pci_dev* dev) if (!dev) return; + if (!dev->msi_enabled) + return; + pos = pci_find_capability(dev, PCI_CAP_ID_MSI); if (!pos) return; @@ -728,28 +701,30 @@ void pci_disable_msi(struct pci_dev* dev) if (!(control & PCI_MSI_FLAGS_ENABLE)) return; + disable_msi_mode(dev, pos, PCI_CAP_ID_MSI); spin_lock_irqsave(&msi_lock, flags); - entry = msi_desc[dev->irq]; + entry = msi_desc[dev->first_msi_irq]; if (!entry || !entry->dev || entry->msi_attrib.type != PCI_CAP_ID_MSI) { spin_unlock_irqrestore(&msi_lock, flags); return; } - if (irq_has_action(dev->irq)) { + if (irq_has_action(dev->first_msi_irq)) { spin_unlock_irqrestore(&msi_lock, flags); printk(KERN_WARNING "PCI: %s: pci_disable_msi() called without " "free_irq() on MSI irq %d\n", - pci_name(dev), dev->irq); - BUG_ON(irq_has_action(dev->irq)); + pci_name(dev), dev->first_msi_irq); + BUG_ON(irq_has_action(dev->first_msi_irq)); } else { default_irq = entry->msi_attrib.default_irq; spin_unlock_irqrestore(&msi_lock, flags); - msi_free_irq(dev, dev->irq); + msi_free_irq(dev, dev->first_msi_irq); /* Restore dev->irq to its default pin-assertion irq */ dev->irq = default_irq; } + dev->first_msi_irq = 0; } static int msi_free_irq(struct pci_dev* dev, int irq) @@ -808,7 +783,7 @@ static int msi_free_irq(struct pci_dev* dev, int irq) int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec) { int status, pos, nr_entries; - int i, j, temp; + int i, j; u16 control; if (!entries || pci_msi_supported(dev) < 0) @@ -836,16 +811,14 @@ int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec) return -EINVAL; /* duplicate entry */ } } - temp = dev->irq; - WARN_ON(!msi_lookup_irq(dev, PCI_CAP_ID_MSIX)); + WARN_ON(!!dev->msix_enabled); /* Check whether driver already requested for MSI irq */ if (pci_find_capability(dev, PCI_CAP_ID_MSI) > 0 && - !msi_lookup_irq(dev, PCI_CAP_ID_MSI)) { + dev->msi_enabled) { printk(KERN_INFO "PCI: %s: Can't enable MSI-X. " "Device already has an MSI irq assigned\n", pci_name(dev)); - dev->irq = temp; return -EINVAL; } status = msix_capability_init(dev, entries, nvec); @@ -854,7 +827,9 @@ int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec) void pci_disable_msix(struct pci_dev* dev) { - int pos, temp; + int irq, head, tail = 0, warning = 0; + unsigned long flags; + int pos; u16 control; if (!pci_msi_enable) @@ -862,6 +837,9 @@ void pci_disable_msix(struct pci_dev* dev) if (!dev) return; + if (!dev->msix_enabled) + return; + pos = pci_find_capability(dev, PCI_CAP_ID_MSIX); if (!pos) return; @@ -872,31 +850,25 @@ void pci_disable_msix(struct pci_dev* dev) disable_msi_mode(dev, pos, PCI_CAP_ID_MSIX); - temp = dev->irq; - if (!msi_lookup_irq(dev, PCI_CAP_ID_MSIX)) { - int irq, head, tail = 0, warning = 0; - unsigned long flags; - - irq = head = dev->irq; - dev->irq = temp; /* Restore pin IRQ */ - while (head != tail) { - spin_lock_irqsave(&msi_lock, flags); - tail = msi_desc[irq]->link.tail; - spin_unlock_irqrestore(&msi_lock, flags); - if (irq_has_action(irq)) - warning = 1; - else if (irq != head) /* Release MSI-X irq */ - msi_free_irq(dev, irq); - irq = tail; - } - msi_free_irq(dev, irq); - if (warning) { - printk(KERN_WARNING "PCI: %s: pci_disable_msix() called without " - "free_irq() on all MSI-X irqs\n", - pci_name(dev)); - BUG_ON(warning > 0); - } + irq = head = dev->first_msi_irq; + while (head != tail) { + spin_lock_irqsave(&msi_lock, flags); + tail = msi_desc[irq]->link.tail; + spin_unlock_irqrestore(&msi_lock, flags); + if (irq_has_action(irq)) + warning = 1; + else if (irq != head) /* Release MSI-X irq */ + msi_free_irq(dev, irq); + irq = tail; + } + msi_free_irq(dev, irq); + if (warning) { + printk(KERN_WARNING "PCI: %s: pci_disable_msix() called without " + "free_irq() on all MSI-X irqs\n", + pci_name(dev)); + BUG_ON(warning > 0); } + dev->first_msi_irq = 0; } /** @@ -910,30 +882,28 @@ void pci_disable_msix(struct pci_dev* dev) **/ void msi_remove_pci_irq_vectors(struct pci_dev* dev) { - int pos, temp; + int pos; unsigned long flags; if (!pci_msi_enable || !dev) return; - temp = dev->irq; /* Save IOAPIC IRQ */ pos = pci_find_capability(dev, PCI_CAP_ID_MSI); - if (pos > 0 && !msi_lookup_irq(dev, PCI_CAP_ID_MSI)) { - if (irq_has_action(dev->irq)) { + if (pos > 0 && dev->msi_enabled) { + if (irq_has_action(dev->first_msi_irq)) { printk(KERN_WARNING "PCI: %s: msi_remove_pci_irq_vectors() " "called without free_irq() on MSI irq %d\n", - pci_name(dev), dev->irq); - BUG_ON(irq_has_action(dev->irq)); + pci_name(dev), dev->first_msi_irq); + BUG_ON(irq_has_action(dev->first_msi_irq)); } else /* Release MSI irq assigned to this device */ - msi_free_irq(dev, dev->irq); - dev->irq = temp; /* Restore IOAPIC IRQ */ + msi_free_irq(dev, dev->first_msi_irq); } pos = pci_find_capability(dev, PCI_CAP_ID_MSIX); - if (pos > 0 && !msi_lookup_irq(dev, PCI_CAP_ID_MSIX)) { + if (pos > 0 && dev->msix_enabled) { int irq, head, tail = 0, warning = 0; void __iomem *base = NULL; - irq = head = dev->irq; + irq = head = dev->first_msi_irq; while (head != tail) { spin_lock_irqsave(&msi_lock, flags); tail = msi_desc[irq]->link.tail; @@ -953,7 +923,6 @@ void msi_remove_pci_irq_vectors(struct pci_dev* dev) pci_name(dev)); BUG_ON(warning > 0); } - dev->irq = temp; /* Restore IOAPIC IRQ */ } } diff --git a/include/linux/pci.h b/include/linux/pci.h index 29765e2..1507f8c 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -174,6 +174,9 @@ struct pci_dev { struct bin_attribute *rom_attr; /* attribute descriptor for sysfs ROM entry */ int rom_attr_enabled; /* has display of the rom attribute been enabled? */ struct bin_attribute *res_attr[DEVICE_COUNT_RESOURCE]; /* sysfs file for resources */ +#ifdef CONFIG_PCI_MSI + unsigned int first_msi_irq; +#endif }; #define pci_dev_g(n) list_entry(n, struct pci_dev, global_list) -- 1.4.4.1.g278f ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 2/6] msi: Remove msi_lock. 2007-01-28 19:42 ` [PATCH 1/6] msi: Kill msi_lookup_irq Eric W. Biederman @ 2007-01-28 19:44 ` Eric W. Biederman 2007-01-28 19:45 ` [PATCH 3/6] msi: Fix msi_remove_pci_irq_vectors Eric W. Biederman 2007-01-28 22:01 ` [PATCH 1/6] msi: Kill msi_lookup_irq Paul Mackerras 1 sibling, 1 reply; 35+ messages in thread From: Eric W. Biederman @ 2007-01-28 19:44 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: linux-pci, David S. Miller, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, Michael Ellerman, Grant Grundler, Tony Luck, linux-kernel, Ingo Molnar With the removal of msi_lookup_irq all of the functions using msi_lock operated on a single device and none of them could reasonably be called on that device at the same time. Since what little synchronization that needs to happen needs to happen outside of the msi functions, msi_lock could never be contended and as such is useless and just complicates the code. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- drivers/pci/msi.c | 20 -------------------- 1 files changed, 0 insertions(+), 20 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 71080c9..5e7a187 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -24,7 +24,6 @@ #include "pci.h" #include "msi.h" -static DEFINE_SPINLOCK(msi_lock); static struct msi_desc* msi_desc[NR_IRQS] = { [0 ... NR_IRQS-1] = NULL }; static struct kmem_cache* msi_cachep; @@ -196,11 +195,7 @@ static struct msi_desc* alloc_msi_entry(void) static void attach_msi_entry(struct msi_desc *entry, int irq) { - unsigned long flags; - - spin_lock_irqsave(&msi_lock, flags); msi_desc[irq] = entry; - spin_unlock_irqrestore(&msi_lock, flags); } static int create_msi_irq(void) @@ -683,7 +678,6 @@ void pci_disable_msi(struct pci_dev* dev) struct msi_desc *entry; int pos, default_irq; u16 control; - unsigned long flags; if (!pci_msi_enable) return; @@ -704,21 +698,17 @@ void pci_disable_msi(struct pci_dev* dev) disable_msi_mode(dev, pos, PCI_CAP_ID_MSI); - spin_lock_irqsave(&msi_lock, flags); entry = msi_desc[dev->first_msi_irq]; if (!entry || !entry->dev || entry->msi_attrib.type != PCI_CAP_ID_MSI) { - spin_unlock_irqrestore(&msi_lock, flags); return; } if (irq_has_action(dev->first_msi_irq)) { - spin_unlock_irqrestore(&msi_lock, flags); printk(KERN_WARNING "PCI: %s: pci_disable_msi() called without " "free_irq() on MSI irq %d\n", pci_name(dev), dev->first_msi_irq); BUG_ON(irq_has_action(dev->first_msi_irq)); } else { default_irq = entry->msi_attrib.default_irq; - spin_unlock_irqrestore(&msi_lock, flags); msi_free_irq(dev, dev->first_msi_irq); /* Restore dev->irq to its default pin-assertion irq */ @@ -732,14 +722,11 @@ static int msi_free_irq(struct pci_dev* dev, int irq) struct msi_desc *entry; int head, entry_nr, type; void __iomem *base; - unsigned long flags; arch_teardown_msi_irq(irq); - spin_lock_irqsave(&msi_lock, flags); entry = msi_desc[irq]; if (!entry || entry->dev != dev) { - spin_unlock_irqrestore(&msi_lock, flags); return -EINVAL; } type = entry->msi_attrib.type; @@ -750,7 +737,6 @@ static int msi_free_irq(struct pci_dev* dev, int irq) msi_desc[entry->link.tail]->link.head = entry->link.head; entry->dev = NULL; msi_desc[irq] = NULL; - spin_unlock_irqrestore(&msi_lock, flags); destroy_msi_irq(irq); @@ -828,7 +814,6 @@ int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec) void pci_disable_msix(struct pci_dev* dev) { int irq, head, tail = 0, warning = 0; - unsigned long flags; int pos; u16 control; @@ -852,9 +837,7 @@ void pci_disable_msix(struct pci_dev* dev) irq = head = dev->first_msi_irq; while (head != tail) { - spin_lock_irqsave(&msi_lock, flags); tail = msi_desc[irq]->link.tail; - spin_unlock_irqrestore(&msi_lock, flags); if (irq_has_action(irq)) warning = 1; else if (irq != head) /* Release MSI-X irq */ @@ -883,7 +866,6 @@ void pci_disable_msix(struct pci_dev* dev) void msi_remove_pci_irq_vectors(struct pci_dev* dev) { int pos; - unsigned long flags; if (!pci_msi_enable || !dev) return; @@ -905,10 +887,8 @@ void msi_remove_pci_irq_vectors(struct pci_dev* dev) irq = head = dev->first_msi_irq; while (head != tail) { - spin_lock_irqsave(&msi_lock, flags); tail = msi_desc[irq]->link.tail; base = msi_desc[irq]->mask_base; - spin_unlock_irqrestore(&msi_lock, flags); if (irq_has_action(irq)) warning = 1; else if (irq != head) /* Release MSI-X irq */ -- 1.4.4.1.g278f ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 3/6] msi: Fix msi_remove_pci_irq_vectors. 2007-01-28 19:44 ` [PATCH 2/6] msi: Remove msi_lock Eric W. Biederman @ 2007-01-28 19:45 ` Eric W. Biederman 2007-01-28 19:47 ` [PATCH 4/6] msi: Remove attach_msi_entry Eric W. Biederman 0 siblings, 1 reply; 35+ messages in thread From: Eric W. Biederman @ 2007-01-28 19:45 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: linux-pci, David S. Miller, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, Michael Ellerman, Grant Grundler, Tony Luck, linux-kernel, Ingo Molnar Since msi_remove_pci_irq_vectors is designed to be called during hotplug remove it is actively wrong to query the hardware and expect meaningful results back. To that end remove the pci_find_capability calls. Testing dev->msi_enabled and dev->msix_enabled gives us all of the information we need. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- drivers/pci/msi.c | 8 ++------ 1 files changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 5e7a187..db9c1d7 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -865,13 +865,10 @@ void pci_disable_msix(struct pci_dev* dev) **/ void msi_remove_pci_irq_vectors(struct pci_dev* dev) { - int pos; - if (!pci_msi_enable || !dev) return; - pos = pci_find_capability(dev, PCI_CAP_ID_MSI); - if (pos > 0 && dev->msi_enabled) { + if (dev->msi_enabled) { if (irq_has_action(dev->first_msi_irq)) { printk(KERN_WARNING "PCI: %s: msi_remove_pci_irq_vectors() " "called without free_irq() on MSI irq %d\n", @@ -880,8 +877,7 @@ void msi_remove_pci_irq_vectors(struct pci_dev* dev) } else /* Release MSI irq assigned to this device */ msi_free_irq(dev, dev->first_msi_irq); } - pos = pci_find_capability(dev, PCI_CAP_ID_MSIX); - if (pos > 0 && dev->msix_enabled) { + if (dev->msix_enabled) { int irq, head, tail = 0, warning = 0; void __iomem *base = NULL; -- 1.4.4.1.g278f ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 4/6] msi: Remove attach_msi_entry. 2007-01-28 19:45 ` [PATCH 3/6] msi: Fix msi_remove_pci_irq_vectors Eric W. Biederman @ 2007-01-28 19:47 ` Eric W. Biederman 2007-01-28 19:52 ` [PATCH 5/6] msi: Kill the msi_desc array Eric W. Biederman 0 siblings, 1 reply; 35+ messages in thread From: Eric W. Biederman @ 2007-01-28 19:47 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: linux-pci, David S. Miller, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, Michael Ellerman, Grant Grundler, Tony Luck, linux-kernel, Ingo Molnar The attach_msi_entry has been reduced to a single simple assignment, so for simplicity remove the abstraction and directory perform the assignment. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- drivers/pci/msi.c | 9 ++------- 1 files changed, 2 insertions(+), 7 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index db9c1d7..b994012 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -193,11 +193,6 @@ static struct msi_desc* alloc_msi_entry(void) return entry; } -static void attach_msi_entry(struct msi_desc *entry, int irq) -{ - msi_desc[irq] = entry; -} - static int create_msi_irq(void) { struct msi_desc *entry; @@ -502,7 +497,7 @@ static int msi_capability_init(struct pci_dev *dev) } dev->first_msi_irq = irq; - attach_msi_entry(entry, irq); + msi_desc[irq] = entry; /* Set MSI enabled bits */ enable_msi_mode(dev, pos, PCI_CAP_ID_MSI); @@ -581,7 +576,7 @@ static int msix_capability_init(struct pci_dev *dev, break; } - attach_msi_entry(entry, irq); + msi_desc[irq] = entry; } if (i != nvec) { int avail = i - 1; -- 1.4.4.1.g278f ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 5/6] msi: Kill the msi_desc array. 2007-01-28 19:47 ` [PATCH 4/6] msi: Remove attach_msi_entry Eric W. Biederman @ 2007-01-28 19:52 ` Eric W. Biederman 2007-01-28 19:56 ` [PATCH 6/6] msi: Make MSI useable more architectures Eric W. Biederman 0 siblings, 1 reply; 35+ messages in thread From: Eric W. Biederman @ 2007-01-28 19:52 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: linux-pci, David S. Miller, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, Michael Ellerman, Grant Grundler, Tony Luck, linux-kernel, Ingo Molnar We need to be able to get from an irq number to a struct msi_desc. The msi_desc array in msi.c had several short comings the big one was that it could not be used outside of msi.c. Using irq_data in struct irq_desc almost worked except on some architectures irq_data needs to be used for something else. So this patch adds a msi_desc pointer to irq_desc, adds the appropriate wrappers and changes all of the msi code to use them. The dynamic_irq_init/cleanup code was tweaked to ensure the new field is left in a well defined state. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- arch/ia64/sn/kernel/msi_sn.c | 2 +- drivers/pci/msi.c | 44 ++++++++++++++++++++--------------------- include/linux/irq.h | 4 +++ kernel/irq/chip.c | 28 ++++++++++++++++++++++++++ 4 files changed, 54 insertions(+), 24 deletions(-) diff --git a/arch/ia64/sn/kernel/msi_sn.c b/arch/ia64/sn/kernel/msi_sn.c index b3a435f..31fbb85 100644 --- a/arch/ia64/sn/kernel/msi_sn.c +++ b/arch/ia64/sn/kernel/msi_sn.c @@ -74,7 +74,7 @@ int sn_setup_msi_irq(unsigned int irq, struct pci_dev *pdev) struct pcibus_bussoft *bussoft = SN_PCIDEV_BUSSOFT(pdev); struct sn_pcibus_provider *provider = SN_PCIDEV_BUSPROVIDER(pdev); - entry = get_irq_data(irq); + entry = get_irq_msi(irq); if (!entry->msi_attrib.is_64) return -EINVAL; diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index b994012..d7a2259 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -24,7 +24,6 @@ #include "pci.h" #include "msi.h" -static struct msi_desc* msi_desc[NR_IRQS] = { [0 ... NR_IRQS-1] = NULL }; static struct kmem_cache* msi_cachep; static int pci_msi_enable = 1; @@ -43,7 +42,7 @@ static void msi_set_mask_bit(unsigned int irq, int flag) { struct msi_desc *entry; - entry = msi_desc[irq]; + entry = get_irq_msi(irq); BUG_ON(!entry || !entry->dev); switch (entry->msi_attrib.type) { case PCI_CAP_ID_MSI: @@ -73,7 +72,7 @@ static void msi_set_mask_bit(unsigned int irq, int flag) void read_msi_msg(unsigned int irq, struct msi_msg *msg) { - struct msi_desc *entry = get_irq_data(irq); + struct msi_desc *entry = get_irq_msi(irq); switch(entry->msi_attrib.type) { case PCI_CAP_ID_MSI: { @@ -112,7 +111,7 @@ void read_msi_msg(unsigned int irq, struct msi_msg *msg) void write_msi_msg(unsigned int irq, struct msi_msg *msg) { - struct msi_desc *entry = get_irq_data(irq); + struct msi_desc *entry = get_irq_msi(irq); switch (entry->msi_attrib.type) { case PCI_CAP_ID_MSI: { @@ -208,7 +207,7 @@ static int create_msi_irq(void) return -EBUSY; } - set_irq_data(irq, entry); + set_irq_msi(irq, entry); return irq; } @@ -217,9 +216,9 @@ static void destroy_msi_irq(unsigned int irq) { struct msi_desc *entry; - entry = get_irq_data(irq); + entry = get_irq_msi(irq); set_irq_chip(irq, NULL); - set_irq_data(irq, NULL); + set_irq_msi(irq, NULL); destroy_irq(irq); kmem_cache_free(msi_cachep, entry); } @@ -371,10 +370,10 @@ static int __pci_save_msix_state(struct pci_dev *dev) while (head != tail) { struct msi_desc *entry; - entry = msi_desc[irq]; + entry = get_irq_msi(irq); read_msi_msg(irq, &entry->msg_save); - tail = msi_desc[irq]->link.tail; + tail = entry->link.tail; irq = tail; } @@ -421,10 +420,10 @@ static void __pci_restore_msix_state(struct pci_dev *dev) /* route the table */ irq = head = dev->first_msi_irq; while (head != tail) { - entry = msi_desc[irq]; + entry = get_irq_msi(irq); write_msi_msg(irq, &entry->msg_save); - tail = msi_desc[irq]->link.tail; + tail = entry->link.tail; irq = tail; } @@ -462,7 +461,7 @@ static int msi_capability_init(struct pci_dev *dev) if (irq < 0) return irq; - entry = get_irq_data(irq); + entry = get_irq_msi(irq); entry->link.head = irq; entry->link.tail = irq; entry->msi_attrib.type = PCI_CAP_ID_MSI; @@ -497,7 +496,7 @@ static int msi_capability_init(struct pci_dev *dev) } dev->first_msi_irq = irq; - msi_desc[irq] = entry; + set_irq_msi(irq, entry); /* Set MSI enabled bits */ enable_msi_mode(dev, pos, PCI_CAP_ID_MSI); @@ -546,7 +545,7 @@ static int msix_capability_init(struct pci_dev *dev, if (irq < 0) break; - entry = get_irq_data(irq); + entry = get_irq_msi(irq); j = entries[i].entry; entries[i].vector = irq; entry->msi_attrib.type = PCI_CAP_ID_MSIX; @@ -576,7 +575,7 @@ static int msix_capability_init(struct pci_dev *dev, break; } - msi_desc[irq] = entry; + set_irq_msi(irq, entry); } if (i != nvec) { int avail = i - 1; @@ -693,7 +692,7 @@ void pci_disable_msi(struct pci_dev* dev) disable_msi_mode(dev, pos, PCI_CAP_ID_MSI); - entry = msi_desc[dev->first_msi_irq]; + entry = get_irq_msi(dev->first_msi_irq); if (!entry || !entry->dev || entry->msi_attrib.type != PCI_CAP_ID_MSI) { return; } @@ -720,7 +719,7 @@ static int msi_free_irq(struct pci_dev* dev, int irq) arch_teardown_msi_irq(irq); - entry = msi_desc[irq]; + entry = get_irq_msi(irq); if (!entry || entry->dev != dev) { return -EINVAL; } @@ -728,10 +727,9 @@ static int msi_free_irq(struct pci_dev* dev, int irq) entry_nr = entry->msi_attrib.entry_nr; head = entry->link.head; base = entry->mask_base; - msi_desc[entry->link.head]->link.tail = entry->link.tail; - msi_desc[entry->link.tail]->link.head = entry->link.head; + get_irq_msi(entry->link.head)->link.tail = entry->link.tail; + get_irq_msi(entry->link.tail)->link.head = entry->link.head; entry->dev = NULL; - msi_desc[irq] = NULL; destroy_msi_irq(irq); @@ -832,7 +830,7 @@ void pci_disable_msix(struct pci_dev* dev) irq = head = dev->first_msi_irq; while (head != tail) { - tail = msi_desc[irq]->link.tail; + tail = get_irq_msi(irq)->link.tail; if (irq_has_action(irq)) warning = 1; else if (irq != head) /* Release MSI-X irq */ @@ -878,8 +876,8 @@ void msi_remove_pci_irq_vectors(struct pci_dev* dev) irq = head = dev->first_msi_irq; while (head != tail) { - tail = msi_desc[irq]->link.tail; - base = msi_desc[irq]->mask_base; + tail = get_irq_msi(irq)->link.tail; + base = get_irq_msi(irq)->mask_base; if (irq_has_action(irq)) warning = 1; else if (irq != head) /* Release MSI-X irq */ diff --git a/include/linux/irq.h b/include/linux/irq.h index 52fc405..5504b67 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -68,6 +68,7 @@ typedef void fastcall (*irq_flow_handler_t)(unsigned int irq, #define IRQ_MOVE_PENDING 0x40000000 /* need to re-target IRQ destination */ struct proc_dir_entry; +struct msi_desc; /** * struct irq_chip - hardware interrupt chip descriptor @@ -148,6 +149,7 @@ struct irq_chip { struct irq_desc { irq_flow_handler_t handle_irq; struct irq_chip *chip; + struct msi_desc *msi_desc; void *handler_data; void *chip_data; struct irqaction *action; /* IRQ action list */ @@ -373,10 +375,12 @@ extern int set_irq_chip(unsigned int irq, struct irq_chip *chip); extern int set_irq_data(unsigned int irq, void *data); extern int set_irq_chip_data(unsigned int irq, void *data); extern int set_irq_type(unsigned int irq, unsigned int type); +extern int set_irq_msi(unsigned int irq, struct msi_desc *entry); #define get_irq_chip(irq) (irq_desc[irq].chip) #define get_irq_chip_data(irq) (irq_desc[irq].chip_data) #define get_irq_data(irq) (irq_desc[irq].handler_data) +#define get_irq_msi(irq) (irq_desc[irq].msi_desc) #endif /* CONFIG_GENERIC_HARDIRQS */ diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index d27b258..475e8a7 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -39,6 +39,7 @@ void dynamic_irq_init(unsigned int irq) desc->chip = &no_irq_chip; desc->handle_irq = handle_bad_irq; desc->depth = 1; + desc->msi_desc = NULL; desc->handler_data = NULL; desc->chip_data = NULL; desc->action = NULL; @@ -74,6 +75,9 @@ void dynamic_irq_cleanup(unsigned int irq) WARN_ON(1); return; } + desc->msi_desc = NULL; + desc->handler_data = NULL; + desc->chip_data = NULL; desc->handle_irq = handle_bad_irq; desc->chip = &no_irq_chip; spin_unlock_irqrestore(&desc->lock, flags); @@ -162,6 +166,30 @@ int set_irq_data(unsigned int irq, void *data) EXPORT_SYMBOL(set_irq_data); /** + * set_irq_data - set irq type data for an irq + * @irq: Interrupt number + * @data: Pointer to interrupt specific data + * + * Set the hardware irq controller data for an irq + */ +int set_irq_msi(unsigned int irq, struct msi_desc *entry) +{ + struct irq_desc *desc; + unsigned long flags; + + if (irq >= NR_IRQS) { + printk(KERN_ERR + "Trying to install msi data for IRQ%d\n", irq); + return -EINVAL; + } + desc = irq_desc + irq; + spin_lock_irqsave(&desc->lock, flags); + desc->msi_desc = entry; + spin_unlock_irqrestore(&desc->lock, flags); + return 0; +} + +/** * set_irq_chip_data - set irq chip data for an irq * @irq: Interrupt number * @data: Pointer to chip specific data -- 1.4.4.1.g278f ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 6/6] msi: Make MSI useable more architectures 2007-01-28 19:52 ` [PATCH 5/6] msi: Kill the msi_desc array Eric W. Biederman @ 2007-01-28 19:56 ` Eric W. Biederman 0 siblings, 0 replies; 35+ messages in thread From: Eric W. Biederman @ 2007-01-28 19:56 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: linux-pci, David S. Miller, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, Michael Ellerman, Grant Grundler, Tony Luck, linux-kernel, Ingo Molnar The arch hooks arch_setup_msi_irq and arch_teardown_msi_irq are now responsible for allocating and freeing the linux irq in addition to setting up the the linux irq to work with the interrupt. arch_setup_msi_irq now takes a pci_device and a msi_desc and returns an irq. With this change in place this code should be useable by all platforms except those that won't let the OS touch the hardware like ppc RTAS. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- arch/i386/kernel/io_apic.c | 17 ++++++--- arch/ia64/kernel/msi_ia64.c | 19 ++++++---- arch/ia64/sn/kernel/msi_sn.c | 20 +++++++--- arch/x86_64/kernel/io_apic.c | 17 ++++++--- drivers/pci/msi.c | 80 +++++++++++------------------------------ include/asm-ia64/machvec.h | 3 +- include/linux/msi.h | 2 +- 7 files changed, 75 insertions(+), 83 deletions(-) diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c index 2424cc9..9ba4f99 100644 --- a/arch/i386/kernel/io_apic.c +++ b/arch/i386/kernel/io_apic.c @@ -2600,25 +2600,32 @@ static struct irq_chip msi_chip = { .retrigger = ioapic_retrigger_irq, }; -int arch_setup_msi_irq(unsigned int irq, struct pci_dev *dev) +int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc) { struct msi_msg msg; - int ret; + int irq, ret; + irq = create_irq(); + if (irq < 0) + return irq; + + set_irq_msi(irq, desc); ret = msi_compose_msg(dev, irq, &msg); - if (ret < 0) + if (ret < 0) { + destroy_irq(irq); return ret; + } write_msi_msg(irq, &msg); set_irq_chip_and_handler_name(irq, &msi_chip, handle_edge_irq, "edge"); - return 0; + return irq; } void arch_teardown_msi_irq(unsigned int irq) { - return; + destroy_irq(irq); } #endif /* CONFIG_PCI_MSI */ diff --git a/arch/ia64/kernel/msi_ia64.c b/arch/ia64/kernel/msi_ia64.c index 822e59a..0d05450 100644 --- a/arch/ia64/kernel/msi_ia64.c +++ b/arch/ia64/kernel/msi_ia64.c @@ -64,12 +64,17 @@ static void ia64_set_msi_irq_affinity(unsigned int irq, cpumask_t cpu_mask) } #endif /* CONFIG_SMP */ -int ia64_setup_msi_irq(unsigned int irq, struct pci_dev *pdev) +int ia64_setup_msi_irq(struct pci_dev *pdev, struct msi_desc *desc) { struct msi_msg msg; unsigned long dest_phys_id; - unsigned int vector; + unsigned int irq, vector; + irq = create_irq(); + if (irq < 0) + return irq; + + set_irq_msi(irq, desc); dest_phys_id = cpu_physical_id(first_cpu(cpu_online_map)); vector = irq; @@ -89,12 +94,12 @@ int ia64_setup_msi_irq(unsigned int irq, struct pci_dev *pdev) write_msi_msg(irq, &msg); set_irq_chip_and_handler(irq, &ia64_msi_chip, handle_edge_irq); - return 0; + return irq; } void ia64_teardown_msi_irq(unsigned int irq) { - return; /* no-op */ + destroy_irq(irq); } static void ia64_ack_msi_irq(unsigned int irq) @@ -126,12 +131,12 @@ static struct irq_chip ia64_msi_chip = { }; -int arch_setup_msi_irq(unsigned int irq, struct pci_dev *pdev) +int arch_setup_msi_irq(struct pci_dev *pdev, struct msi_desc *desc) { if (platform_setup_msi_irq) - return platform_setup_msi_irq(irq, pdev); + return platform_setup_msi_irq(pdev, desc); - return ia64_setup_msi_irq(irq, pdev); + return ia64_setup_msi_irq(pdev, desc); } void arch_teardown_msi_irq(unsigned int irq) diff --git a/arch/ia64/sn/kernel/msi_sn.c b/arch/ia64/sn/kernel/msi_sn.c index 31fbb85..ea3dc38 100644 --- a/arch/ia64/sn/kernel/msi_sn.c +++ b/arch/ia64/sn/kernel/msi_sn.c @@ -59,13 +59,12 @@ void sn_teardown_msi_irq(unsigned int irq) sn_intr_free(nasid, widget, sn_irq_info); sn_msi_info[irq].sn_irq_info = NULL; - return; + destroy_irq(irq); } -int sn_setup_msi_irq(unsigned int irq, struct pci_dev *pdev) +int sn_setup_msi_irq(struct pci_dev *pdev, struct msi_desc *entry) { struct msi_msg msg; - struct msi_desc *entry; int widget; int status; nasid_t nasid; @@ -73,8 +72,8 @@ int sn_setup_msi_irq(unsigned int irq, struct pci_dev *pdev) struct sn_irq_info *sn_irq_info; struct pcibus_bussoft *bussoft = SN_PCIDEV_BUSSOFT(pdev); struct sn_pcibus_provider *provider = SN_PCIDEV_BUSPROVIDER(pdev); + int irq; - entry = get_irq_msi(irq); if (!entry->msi_attrib.is_64) return -EINVAL; @@ -84,6 +83,11 @@ int sn_setup_msi_irq(unsigned int irq, struct pci_dev *pdev) if (provider == NULL || provider->dma_map_consistent == NULL) return -EINVAL; + irq = create_irq(); + if (irq < 0) + return irq; + + set_irq_msi(irq, entry); /* * Set up the vector plumbing. Let the prom (via sn_intr_alloc) * decide which cpu to direct this msi at by default. @@ -95,12 +99,15 @@ int sn_setup_msi_irq(unsigned int irq, struct pci_dev *pdev) SWIN_WIDGETNUM(bussoft->bs_base); sn_irq_info = kzalloc(sizeof(struct sn_irq_info), GFP_KERNEL); - if (! sn_irq_info) + if (! sn_irq_info) { + destroy_irq(irq); return -ENOMEM; + } status = sn_intr_alloc(nasid, widget, sn_irq_info, irq, -1, -1); if (status) { kfree(sn_irq_info); + destroy_irq(irq); return -ENOMEM; } @@ -121,6 +128,7 @@ int sn_setup_msi_irq(unsigned int irq, struct pci_dev *pdev) if (! bus_addr) { sn_intr_free(nasid, widget, sn_irq_info); kfree(sn_irq_info); + destroy_irq(irq); return -ENOMEM; } @@ -139,7 +147,7 @@ int sn_setup_msi_irq(unsigned int irq, struct pci_dev *pdev) write_msi_msg(irq, &msg); set_irq_chip_and_handler(irq, &sn_msi_chip, handle_edge_irq); - return 0; + return irq; } #ifdef CONFIG_SMP diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c index d7bad90..6be6730 100644 --- a/arch/x86_64/kernel/io_apic.c +++ b/arch/x86_64/kernel/io_apic.c @@ -1956,24 +1956,31 @@ static struct irq_chip msi_chip = { .retrigger = ioapic_retrigger_irq, }; -int arch_setup_msi_irq(unsigned int irq, struct pci_dev *dev) +int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc) { struct msi_msg msg; - int ret; + int irq, ret; + irq = create_irq(); + if (irq < 0) + return irq; + + set_irq_msi(irq, desc); ret = msi_compose_msg(dev, irq, &msg); - if (ret < 0) + if (ret < 0) { + destroy_irq(irq); return ret; + } write_msi_msg(irq, &msg); set_irq_chip_and_handler_name(irq, &msi_chip, handle_edge_irq, "edge"); - return 0; + return irq; } void arch_teardown_msi_irq(unsigned int irq) { - return; + destroy_irq(irq); } #endif /* CONFIG_PCI_MSI */ diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index d7a2259..c6a6d46 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -192,37 +192,6 @@ static struct msi_desc* alloc_msi_entry(void) return entry; } -static int create_msi_irq(void) -{ - struct msi_desc *entry; - int irq; - - entry = alloc_msi_entry(); - if (!entry) - return -ENOMEM; - - irq = create_irq(); - if (irq < 0) { - kmem_cache_free(msi_cachep, entry); - return -EBUSY; - } - - set_irq_msi(irq, entry); - - return irq; -} - -static void destroy_msi_irq(unsigned int irq) -{ - struct msi_desc *entry; - - entry = get_irq_msi(irq); - set_irq_chip(irq, NULL); - set_irq_msi(irq, NULL); - destroy_irq(irq); - kmem_cache_free(msi_cachep, entry); -} - static void enable_msi_mode(struct pci_dev *dev, int pos, int type) { u16 control; @@ -449,7 +418,6 @@ void pci_restore_msi_state(struct pci_dev *dev) **/ static int msi_capability_init(struct pci_dev *dev) { - int status; struct msi_desc *entry; int pos, irq; u16 control; @@ -457,13 +425,10 @@ static int msi_capability_init(struct pci_dev *dev) pos = pci_find_capability(dev, PCI_CAP_ID_MSI); pci_read_config_word(dev, msi_control_reg(pos), &control); /* MSI Entry Initialization */ - irq = create_msi_irq(); - if (irq < 0) - return irq; + entry = alloc_msi_entry(); + if (!entry) + return -ENOMEM; - entry = get_irq_msi(irq); - entry->link.head = irq; - entry->link.tail = irq; entry->msi_attrib.type = PCI_CAP_ID_MSI; entry->msi_attrib.is_64 = is_64bit_address(control); entry->msi_attrib.entry_nr = 0; @@ -489,14 +454,16 @@ static int msi_capability_init(struct pci_dev *dev) maskbits); } /* Configure MSI capability structure */ - status = arch_setup_msi_irq(irq, dev); - if (status < 0) { - destroy_msi_irq(irq); - return status; + irq = arch_setup_msi_irq(dev, entry); + if (irq < 0) { + kmem_cache_free(msi_cachep, entry); + return irq; } - + entry->link.head = irq; + entry->link.tail = irq; dev->first_msi_irq = irq; set_irq_msi(irq, entry); + /* Set MSI enabled bits */ enable_msi_mode(dev, pos, PCI_CAP_ID_MSI); @@ -518,7 +485,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, int nvec) { struct msi_desc *head = NULL, *tail = NULL, *entry = NULL; - int status; int irq, pos, i, j, nr_entries, temp = 0; unsigned long phys_addr; u32 table_offset; @@ -541,13 +507,11 @@ static int msix_capability_init(struct pci_dev *dev, /* MSI-X Table Initialization */ for (i = 0; i < nvec; i++) { - irq = create_msi_irq(); - if (irq < 0) + entry = alloc_msi_entry(); + if (!entry) break; - entry = get_irq_msi(irq); j = entries[i].entry; - entries[i].vector = irq; entry->msi_attrib.type = PCI_CAP_ID_MSIX; entry->msi_attrib.is_64 = 1; entry->msi_attrib.entry_nr = j; @@ -556,6 +520,14 @@ static int msix_capability_init(struct pci_dev *dev, entry->msi_attrib.pos = pos; entry->dev = dev; entry->mask_base = base; + + /* Configure MSI-X capability structure */ + irq = arch_setup_msi_irq(dev, entry); + if (irq < 0) { + kmem_cache_free(msi_cachep, entry); + break; + } + entries[i].vector = irq; if (!head) { entry->link.head = irq; entry->link.tail = irq; @@ -568,12 +540,6 @@ static int msix_capability_init(struct pci_dev *dev, } temp = irq; tail = entry; - /* Configure MSI-X capability structure */ - status = arch_setup_msi_irq(irq, dev); - if (status < 0) { - destroy_msi_irq(irq); - break; - } set_irq_msi(irq, entry); } @@ -717,8 +683,6 @@ static int msi_free_irq(struct pci_dev* dev, int irq) int head, entry_nr, type; void __iomem *base; - arch_teardown_msi_irq(irq); - entry = get_irq_msi(irq); if (!entry || entry->dev != dev) { return -EINVAL; @@ -729,9 +693,9 @@ static int msi_free_irq(struct pci_dev* dev, int irq) base = entry->mask_base; get_irq_msi(entry->link.head)->link.tail = entry->link.tail; get_irq_msi(entry->link.tail)->link.head = entry->link.head; - entry->dev = NULL; - destroy_msi_irq(irq); + arch_teardown_msi_irq(irq); + kmem_cache_free(msi_cachep, entry); if (type == PCI_CAP_ID_MSIX) { writel(1, base + entry_nr * PCI_MSIX_ENTRY_SIZE + diff --git a/include/asm-ia64/machvec.h b/include/asm-ia64/machvec.h index a3891eb..3c96ac1 100644 --- a/include/asm-ia64/machvec.h +++ b/include/asm-ia64/machvec.h @@ -21,6 +21,7 @@ struct mm_struct; struct pci_bus; struct task_struct; struct pci_dev; +struct msi_desc; typedef void ia64_mv_setup_t (char **); typedef void ia64_mv_cpu_init_t (void); @@ -79,7 +80,7 @@ typedef unsigned short ia64_mv_readw_relaxed_t (const volatile void __iomem *); typedef unsigned int ia64_mv_readl_relaxed_t (const volatile void __iomem *); typedef unsigned long ia64_mv_readq_relaxed_t (const volatile void __iomem *); -typedef int ia64_mv_setup_msi_irq_t (unsigned int irq, struct pci_dev *pdev); +typedef int ia64_mv_setup_msi_irq_t (struct pci_dev *pdev, struct msi_desc *); typedef void ia64_mv_teardown_msi_irq_t (unsigned int irq); static inline void diff --git a/include/linux/msi.h b/include/linux/msi.h index b99976b..74c8a2e 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -41,7 +41,7 @@ struct msi_desc { /* * The arch hook for setup up msi irqs */ -int arch_setup_msi_irq(unsigned int irq, struct pci_dev *dev); +int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc); void arch_teardown_msi_irq(unsigned int irq); -- 1.4.4.1.g278f ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] msi: Kill msi_lookup_irq 2007-01-28 19:42 ` [PATCH 1/6] msi: Kill msi_lookup_irq Eric W. Biederman 2007-01-28 19:44 ` [PATCH 2/6] msi: Remove msi_lock Eric W. Biederman @ 2007-01-28 22:01 ` Paul Mackerras 2007-01-28 22:18 ` Eric W. Biederman 1 sibling, 1 reply; 35+ messages in thread From: Paul Mackerras @ 2007-01-28 22:01 UTC (permalink / raw) To: Eric W. Biederman Cc: Greg Kroah-Hartman, Tony Luck, Grant Grundler, Ingo Molnar, linux-kernel, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, linux-pci, David S. Miller Eric W. Biederman writes: > @@ -693,15 +664,14 @@ int pci_enable_msi(struct pci_dev* dev) > if (!pos) > return -EINVAL; > > - WARN_ON(!msi_lookup_irq(dev, PCI_CAP_ID_MSI)); > + WARN_ON(!!dev->msi_enabled); Minor nit: what's wrong with just WARN_ON(dev->msi_enabled) ? Also here: > @@ -836,16 +811,14 @@ int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec) > return -EINVAL; /* duplicate entry */ > } > } > - temp = dev->irq; > - WARN_ON(!msi_lookup_irq(dev, PCI_CAP_ID_MSIX)); > + WARN_ON(!!dev->msix_enabled); Paul. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/6] msi: Kill msi_lookup_irq 2007-01-28 22:01 ` [PATCH 1/6] msi: Kill msi_lookup_irq Paul Mackerras @ 2007-01-28 22:18 ` Eric W. Biederman 0 siblings, 0 replies; 35+ messages in thread From: Eric W. Biederman @ 2007-01-28 22:18 UTC (permalink / raw) To: Paul Mackerras Cc: Greg Kroah-Hartman, Tony Luck, Grant Grundler, Ingo Molnar, linux-kernel, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, linux-pci, David S. Miller Paul Mackerras <paulus@samba.org> writes: > Eric W. Biederman writes: > >> @@ -693,15 +664,14 @@ int pci_enable_msi(struct pci_dev* dev) >> if (!pos) >> return -EINVAL; >> >> - WARN_ON(!msi_lookup_irq(dev, PCI_CAP_ID_MSI)); >> + WARN_ON(!!dev->msi_enabled); > > Minor nit: what's wrong with just WARN_ON(dev->msi_enabled) ? It's a bitfield so gcc complains when something in WARN_ON calls typeof on it. So it is easier to just say !! than to dig into WARN_ON and see if it made any sense to fix WARN_ON, or to see if gcc needed the bug fix. > Also here: > >> @@ -836,16 +811,14 @@ int pci_enable_msix(struct pci_dev* dev, struct > msix_entry *entries, int nvec) >> return -EINVAL; /* duplicate entry */ >> } >> } >> - temp = dev->irq; >> - WARN_ON(!msi_lookup_irq(dev, PCI_CAP_ID_MSIX)); >> + WARN_ON(!!dev->msix_enabled); > > Paul. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-28 19:40 ` [PATCH 0/6] MSI portability cleanups Eric W. Biederman 2007-01-28 19:42 ` [PATCH 1/6] msi: Kill msi_lookup_irq Eric W. Biederman @ 2007-01-28 20:23 ` Benjamin Herrenschmidt 2007-01-28 20:47 ` Jeff Garzik 2007-01-28 21:34 ` Eric W. Biederman 1 sibling, 2 replies; 35+ messages in thread From: Benjamin Herrenschmidt @ 2007-01-28 20:23 UTC (permalink / raw) To: Eric W. Biederman Cc: Greg Kroah-Hartman, Tony Luck, Grant Grundler, Ingo Molnar, linux-kernel, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, linux-pci, David S. Miller > The other big change is that I added a field to irq_desc to point > at the msi_desc. This removes the conflicts with the existing pointer > fields and makes the irq -> msi_desc mapping useable outside of msi.c I'm not even sure we would have needed that with Michael's mecanism in fact. One other reason why I prefer it. Basically, backends like MPIC etc... don't need it. The irq chip operations are normal MPIC operations and don't need to know they are done on an MSI nor what MSI etc... and thus we don't need it. Same with RTAS. On the other hand, x86 needs it, but then, x86 uses it's own MSI specific irq_chip, in which case it can use irq_desc->chip_data as long as it does it within the backend. So I may have missed a case where a given backend might need both that irq -> msi_desc mapping -and- use irq_desc->chip_data for other things, but that's one thing I was hoping we could avoid with Michael's code. > The only architecture problem that isn't solvable in this context is > the problem of supporting the crazy hypervisor on the ppc RTAS, which > asks us to drive the hardware but does not give us access to the > hardware registers. So you are saying that we should use your model while admitting that it can't solve our problems... I really don't understand why you seem so totally opposed to Michael's approach which definitely looks to me like the sane thing to do. Note that in the end, Michael's approach isn't -that- different from yours, just a bit more abstracted. Ben. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-28 20:23 ` [PATCH 0/6] MSI portability cleanups Benjamin Herrenschmidt @ 2007-01-28 20:47 ` Jeff Garzik 2007-01-28 21:20 ` Eric W. Biederman ` (2 more replies) 2007-01-28 21:34 ` Eric W. Biederman 1 sibling, 3 replies; 35+ messages in thread From: Jeff Garzik @ 2007-01-28 20:47 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Eric W. Biederman, Greg Kroah-Hartman, Tony Luck, Grant Grundler, Ingo Molnar, linux-kernel, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, linux-pci, David S. Miller Benjamin Herrenschmidt wrote: >> The only architecture problem that isn't solvable in this context is >> the problem of supporting the crazy hypervisor on the ppc RTAS, which >> asks us to drive the hardware but does not give us access to the >> hardware registers. > > So you are saying that we should use your model while admitting that it > can't solve our problems... > > I really don't understand why you seem so totally opposed to Michael's > approach which definitely looks to me like the sane thing to do. Note > that in the end, Michael's approach isn't -that- different from yours, > just a bit more abstracted. I think the high-level ops approach makes more sense. It's more future proof, in addition to covering all existing implementations. Jeff ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-28 20:47 ` Jeff Garzik @ 2007-01-28 21:20 ` Eric W. Biederman 2007-01-28 21:26 ` Ingo Molnar ` (2 more replies) 2007-01-28 22:11 ` Eric W. Biederman 2007-01-28 23:42 ` David Miller 2 siblings, 3 replies; 35+ messages in thread From: Eric W. Biederman @ 2007-01-28 21:20 UTC (permalink / raw) To: Jeff Garzik Cc: Benjamin Herrenschmidt, Greg Kroah-Hartman, Tony Luck, Grant Grundler, Ingo Molnar, linux-kernel, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, linux-pci, David S. Miller Jeff Garzik <jeff@garzik.org> writes: > Benjamin Herrenschmidt wrote: >>> The only architecture problem that isn't solvable in this context is >>> the problem of supporting the crazy hypervisor on the ppc RTAS, which >>> asks us to drive the hardware but does not give us access to the >>> hardware registers. >> >> So you are saying that we should use your model while admitting that it >> can't solve our problems... >> >> I really don't understand why you seem so totally opposed to Michael's >> approach which definitely looks to me like the sane thing to do. Note >> that in the end, Michael's approach isn't -that- different from yours, >> just a bit more abstracted. > > > I think the high-level ops approach makes more sense. It's more future proof, > in addition to covering all existing implementations. I'm not arguing against an operations based approach. I'm arguing for simple obviously correct steps, and not throwing the baby out with the bath water. My patches should be a precursor to an operations based approach because they are simple step from where we are now. Every keeps telling me the operations approach is the right thing to do and I see code that doesn't work, and can't work without extreme difficulty on the architectures currently supported. That makes me irritated, and unfortunately much less accepting. I see people pushing ridiculous interfaces like the RTAS hypervisor interface at me, and saying we must support running firmware drivers in the msi code. I just ask for simple evolutionary change as I presented, so we don't break things or loose requirements along the way. Please argue with me on the details of what the ops based approach does better, which specific problems does it solve. The proposed ops base approach mixes different kinds of operations in the same structure: We have the hardware operations: + /* enable - Enable the MSIs on the given device. + * + * @pdev: PCI device structure. + * @num: The number of MSIs being requested. + * @entries: An array of @num msix_entry structures. + * @type: The type, MSI or MSI-X. + * + * This routine enables the MSIs on the given PCI device. + * + * If the enable completes succesfully this routine must return 0. + * + * This callback is optional. + */ + int (*enable) (struct pci_dev *pdev, int num, + struct msix_entry *entries, int type); + + /* disable - disable the MSI for the given device. + * + * @pdev: PCI device structure. + * @num: The number of MSIs to disable. + * @entries: An array of @num msix_entry structures. + * @type: The type, MSI or MSI-X. + * + * This routine should perform the inverse of enable. + */ + void (*disable) (struct pci_dev *pdev, int num, + struct msix_entry *entries, int type); + Which are either talking directly to the hardware, or are talking to the hypervisor, which is using hardware isolation so it is safe to talk directly to the hardware but isn't leting us? If we could use things to work around errata in card implementation details it would make some sense to me (although we don't seem to have any cards with that got the MSI registers wrong at this point). Regardless these operations clearly have a different granularity than the other operations, and should have a different lookup method. We have the irq operations. + /* check - Check that the requested MSI allocation is OK. + * + * @pdev: PCI device structure. + * @num: The number of MSIs being requested. + * @entries: An array of @num msix_entry structures. + * @type: The type, MSI or MSI-X. + * + * This routine is responsible for checking that the given PCI device + * can be allocated the requested type and number of MSIs. + * + * It is up to this routine to determine if the requested number of + * MSIs is valid for the device in question. If the number of MSIs, + * or the particular MSI entries, can not be supported for any + * reason this routine must return non-zero. + * + * If the check is succesful this routine must return 0. + */ + int (*check) (struct pci_dev *pdev, int num, + struct msix_entry *entries, int type); + + /* alloc - Allocate MSIs for the given device. + * + * @pdev: PCI device structure. + * @num: The number of MSIs being requested. + * @entries: An array of @num msix_entry structures. + * @type: The type, MSI or MSI-X. + * + * This routine is responsible for allocating the number of + * MSIs to the given PCI device. + * + * Upon completion there must be @num MSIs assigned to this device, + * the "vector" member of each struct msix_entry must be filled in + * with the Linux irq number allocated to it. The corresponding + * irq_descs must also be setup with an appropriate handler if + * required. + * + * If the allocation completes succesfully this routine must return 0. + */ + int (*alloc) (struct pci_dev *pdev, int num, + struct msix_entry *entries, int type); + + /* free - free the MSIs assigned to the device. + * + * @pdev: PCI device structure. + * @num: The number of MSIs. + * @entries: An array of @num msix_entry structures. + * @type: The type, MSI or MSI-X. + * + * Free all MSIs and associated resources for the device. If any + * MSIs have been enabled they will have been disabled already by + * the generic code. + */ + void (*free) (struct pci_dev *pdev, int num, + struct msix_entry *entries, int type); These because they are per irq make sense as per bus operations unless you have a good architecture definition like x86 has. Roughly those operations are what we currently have except the current operations are a little simpler and easier to deal with for the architecture code. And then there are the operations that are going in the wrong direction. + /* setup_msi_msg - Setup an MSI message for the given device. + * + * @pdev: PCI device structure. + * @entry: The MSI entry to create a msi_msg for. + * @msg: Written with the magic address and data. + * @type: The type, MSI or MSI-X. + * + * Returns the "magic address and data" used to trigger the msi. + * If the setup is succesful this routine must return 0. + * + * This callback is optional. + */ + int (*setup_msi_msg) (struct pci_dev *pdev, struct msix_entry *entry, + struct msi_msg *msg, int type); Much to much of the operations base approach as proposed looks like when you have a hammer every problem looks like a nail, given how much confusion about what was put into the operations structure. I don't mind taking a small step and making the alloc/free primitives per bus in a generic fashion. I don't mind supporting poorly designed hypervisor interfaces, if it is easy. I do strongly mind code that doesn't work, or we can't git-bisect through to find where bugs were introduced. Eric ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-28 21:20 ` Eric W. Biederman @ 2007-01-28 21:26 ` Ingo Molnar 2007-01-28 22:09 ` Benjamin Herrenschmidt 2007-01-28 23:44 ` David Miller 2 siblings, 0 replies; 35+ messages in thread From: Ingo Molnar @ 2007-01-28 21:26 UTC (permalink / raw) To: Eric W. Biederman Cc: Jeff Garzik, Benjamin Herrenschmidt, Greg Kroah-Hartman, Tony Luck, Grant Grundler, linux-kernel, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, linux-pci, David S. Miller * Eric W. Biederman <ebiederm@xmission.com> wrote: > I'm not arguing against an operations based approach. I'm arguing for > simple obviously correct steps, and not throwing the baby out with the > bath water. > > My patches should be a precursor to an operations based approach > because they are simple step from where we are now. yeah. I'd say your approach is to go from A to B: [A] -----------------------------------------------------> [B] | [C] while there might be some other arguments that "no, lets go to C instead", i say lets not throw away the already implemented and already working and nicely layered [A]->[B] transition, just because there's an argument whether the end result should be 'B' or 'C'. Unless someone who wants to see 'C' produces a patchset that walks the whole way i dont see any reason to not go with your patchset. It clearly removes alot of cruft. Acked-by: Ingo Molnar <mingo@elte.hu> Ingo ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-28 21:20 ` Eric W. Biederman 2007-01-28 21:26 ` Ingo Molnar @ 2007-01-28 22:09 ` Benjamin Herrenschmidt 2007-01-28 23:26 ` Eric W. Biederman 2007-02-01 4:29 ` Greg KH 2007-01-28 23:44 ` David Miller 2 siblings, 2 replies; 35+ messages in thread From: Benjamin Herrenschmidt @ 2007-01-28 22:09 UTC (permalink / raw) To: Eric W. Biederman Cc: Jeff Garzik, Greg Kroah-Hartman, Tony Luck, Grant Grundler, Ingo Molnar, linux-kernel, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, linux-pci, David S. Miller .../... (enable/disable bits) > Which are either talking directly to the hardware, or are talking > to the hypervisor, which is using hardware isolation so it is safe to > talk directly to the hardware but isn't leting us? If we could use > things to work around errata in card implementation details it would > make some sense to me (although we don't seem to have any cards with > that got the MSI registers wrong at this point). Regardless these > operations clearly have a different granularity than the other > operations, and should have a different lookup method. I'm not sure I undersdand the point of your rant here. The hypervisor case hooks at alloc/free and does everything from there. It doens't use an enable or a diable hook. The enable/disable ops are optional for that reason. When not present, it's assumed that alloc/free do it all. When using a "direct" approach (what we call "raw"), we expect backends to just plug the provided helper functions in enable/disable. It's still a hook so that one can do additional platform specific bits if necessary, but in that specific case, I do agree we could just remove it and move the "raw" code back into the toplevel functions, with a way (via a special return code from alloc maybe ?) for the HV case to tell us not to go through there. That was one of our initial approaches when working with Michael on the design. However, that sort of hurts my sense of aestetics :-) I quite like the toplevel to be just a toplevel, and clearly separate the raw "helpers" and the backend. Provides more flexibility to handle all possible crazy cases in the future. You seem to absolutely want to get the HV case to go throuh the same code path as the "raw" case, and that will not happen. .../... (irq operations) > These because they are per irq make sense as per bus operations unless > you have a good architecture definition like x86 has. Roughly those > operations are what we currently have except the current operations > are a little simpler and easier to deal with for the architecture > code. Oh ? How so ? (easier/simpler ?) > And then there are the operations that are going in the wrong > direction. > + /* setup_msi_msg - Setup an MSI message for the given device. > + * > + * @pdev: PCI device structure. > + * @entry: The MSI entry to create a msi_msg for. > + * @msg: Written with the magic address and data. > + * @type: The type, MSI or MSI-X. > + * > + * Returns the "magic address and data" used to trigger the msi. > + * If the setup is succesful this routine must return 0. > + * > + * This callback is optional. > + */ > + int (*setup_msi_msg) (struct pci_dev *pdev, struct msix_entry *entry, > + struct msi_msg *msg, int type); > > Much to much of the operations base approach as proposed looks like > when you have a hammer every problem looks like a nail, given how much > confusion about what was put into the operations structure. This is indeed a lower level hook to be used by "raw" enable/disable. An other approach would be to remove it, have each backend have it's own enable/disable that obtains the address/data and calls into a helper to program them. This would indeed be a little bit nicer in a layering perspective. But it adds a bit more code to each backend, so we kept things closer to the way they used to be. I don't have a firm reason not to change it however, I need talk to Michael in case he has more good reasons to keep it that way around. > I don't mind taking a small step and making the alloc/free primitives > per bus in a generic fashion. > > I don't mind supporting poorly designed hypervisor interfaces, if it > is easy. And it it's not, we don't support them ? Ugh ? Well, it happens to be fairly easy but still, I don't understand your approach there. > I do strongly mind code that doesn't work, or we can't git-bisect > through to find where bugs were introduced. It doesn't work yet for you which is why it's not -replacing- your current code. Again, this was intended as arch code in the first place, until other archs and maintainers voiced their opinion that we should move that to generic code. It may not be perfect, we may still want to change things, maybe make some things closer to the direction you are taking for the x86 code, but I don't understand the root of such a strong opposition except mayeb that you've spent time trying to fix the x86 junk and now are annoyed to see some of that work possibly replaced ? I agree with the problem if small changes & bisecting in the general case. In fact, it would be nice if we could use your fixed code with little change to "plug" it in as the x86 backend in many ways. Michael's work isn't a re-implementation of everything, it's a re-structuring, lots of bits of code that are missing can possibly be lifted from the existing working implementation. If we followed that "only do incrementental changes" rule all the time, imagine in what state would be our USB stack today since we couldn't have dropped in Linus replacement one ... Ben. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-28 22:09 ` Benjamin Herrenschmidt @ 2007-01-28 23:26 ` Eric W. Biederman 2007-01-28 23:37 ` David Miller 2007-01-29 1:33 ` Benjamin Herrenschmidt 2007-02-01 4:29 ` Greg KH 1 sibling, 2 replies; 35+ messages in thread From: Eric W. Biederman @ 2007-01-28 23:26 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Jeff Garzik, Greg Kroah-Hartman, Tony Luck, Grant Grundler, Ingo Molnar, linux-kernel, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, linux-pci, David S. Miller Benjamin Herrenschmidt <benh@kernel.crashing.org> writes: > .../... (enable/disable bits) > >> Which are either talking directly to the hardware, or are talking >> to the hypervisor, which is using hardware isolation so it is safe to >> talk directly to the hardware but isn't leting us? If we could use >> things to work around errata in card implementation details it would >> make some sense to me (although we don't seem to have any cards with >> that got the MSI registers wrong at this point). Regardless these >> operations clearly have a different granularity than the other >> operations, and should have a different lookup method. > > I'm not sure I undersdand the point of your rant here. The hypervisor > case hooks at alloc/free and does everything from there. It doens't use > an enable or a diable hook. > > The enable/disable ops are optional for that reason. When not present, > it's assumed that alloc/free do it all. Well my feeling is that in your weird HV case enable/disable should do all of the work. And alloc/free won't have to do anything because the bus doesn't matter any more. > When using a "direct" approach (what we call "raw"), we expect backends > to just plug the provided helper functions in enable/disable. It's still > a hook so that one can do additional platform specific bits if > necessary, but in that specific case, I do agree we could just remove it > and move the "raw" code back into the toplevel functions, with a way > (via a special return code from alloc maybe ?) for the HV case to tell > us not to go through there. That was one of our initial approaches when > working with Michael on the design. > > However, that sort of hurts my sense of aestetics :-) I quite like the > toplevel to be just a toplevel, and clearly separate the raw "helpers" > and the backend. Provides more flexibility to handle all possible crazy > cases in the future. To be clear I see this as 2 distinct layers of code. enable/disable that talks directly to the hardware, and the helpers of enable/disable that allocate the irq. I base this on the fact that I only need the alloc/free when I am exclusively working with real hardware. > You seem to absolutely want to get the HV case to go throuh the same > code path as the "raw" case, and that will not happen. Yes I do. Because that is the only sane approach for a HV to use. And yes we need an irq allocator to call the HV to setup the upstream reception of the msi message. However I don't think it will be to hard to support your HV once we get the real hardware supported. I just refuse to consider it before we have figured out what makes sense in the context where we have to do everything. > .../... (irq operations) > >> These because they are per irq make sense as per bus operations unless >> you have a good architecture definition like x86 has. Roughly those >> operations are what we currently have except the current operations >> are a little simpler and easier to deal with for the architecture >> code. > > Oh ? How so ? (easier/simpler ?) I don't take a type parameter, and I don't take a vector. All of that work is done in the generic code. >> And then there are the operations that are going in the wrong >> direction. >> + /* setup_msi_msg - Setup an MSI message for the given device. >> + * >> + * @pdev: PCI device structure. >> + * @entry: The MSI entry to create a msi_msg for. >> + * @msg: Written with the magic address and data. >> + * @type: The type, MSI or MSI-X. >> + * >> + * Returns the "magic address and data" used to trigger the msi. >> + * If the setup is succesful this routine must return 0. >> + * >> + * This callback is optional. >> + */ >> + int (*setup_msi_msg) (struct pci_dev *pdev, struct msix_entry *entry, >> + struct msi_msg *msg, int type); >> >> Much to much of the operations base approach as proposed looks like >> when you have a hammer every problem looks like a nail, given how much >> confusion about what was put into the operations structure. > > This is indeed a lower level hook to be used by "raw" enable/disable. An > other approach would be to remove it, have each backend have it's own > enable/disable that obtains the address/data and calls into a helper to > program them. This would indeed be a little bit nicer in a layering > perspective. But it adds a bit more code to each backend, so we kept > things closer to the way they used to be. I don't have a firm reason not > to change it however, I need talk to Michael in case he has more good > reasons to keep it that way around. The current code in the kernel already is structured that way because we have to reprogram the msi message on each irq migration. Not using a helper to write the message would be a noticeable change and require a fair amount of code rewriting on the currently supported architectures. >> I don't mind taking a small step and making the alloc/free primitives >> per bus in a generic fashion. >> >> I don't mind supporting poorly designed hypervisor interfaces, if it >> is easy. > > And it it's not, we don't support them ? Ugh ? Well, it happens to be > fairly easy but still, I don't understand your approach there. Yes. In general the mainline linux kernel does not support certain classes of stupidity. TCP offload engines, firmware drivers for hardware we care about, a fixed ABI to binary only modules, etc. It is the responsibility of the OS to setup MSI so we do it, not the firmware so we do it. Not supporting stupid things that are hard to support encourages other people not to be so silly, especially when linux still works on the hardware when that silly feature isn't supported. For similar reasons we don't support more than 1 irq with a plain MSI capability. It is hard, we can't do it on most hardware, and anyone who wants more than 1 irq should just implement MSI-X and everyone will be able to use it, on any hardware. Part of the reason to not support a messed up HV interface if it hard is that a HV is just software. Which means the incremental cost to fix it is roughly the same as fixing the linux kernel, and it puts the burden on the people doing stupid things not on the rest of us forever more. >> I do strongly mind code that doesn't work, or we can't git-bisect >> through to find where bugs were introduced. > > It doesn't work yet for you which is why it's not -replacing- your > current code. Again, this was intended as arch code in the first place, > until other archs and maintainers voiced their opinion that we should > move that to generic code. It may not be perfect, we may still want to > change things, maybe make some things closer to the direction you are > taking for the x86 code, but I don't understand the root of such a > strong opposition except mayeb that you've spent time trying to fix the > x86 junk and now are annoyed to see some of that work possibly > replaced ? No. I have spent time fixing what is there, and made it work. I see implementations proposed that don't handle cases I have fixed, and I don't see anything that resembles a simple migration path for i386, x86_64 and ia64. Which is part of what annoys me when I am told the ops work for everything. As for the code not working important parts of the code (like MSI-X) don't even work on ppc. The strength of my opposition is largely shaped by the number of people wearing rose colored glasses and ignoring the problems, and missing huge details. Given that we have been talking about things since before OLS I would have expected the ppc code to be a little farther along. > I agree with the problem if small changes & bisecting in the general > case. In fact, it would be nice if we could use your fixed code with > little change to "plug" it in as the x86 backend in many ways. Michael's > work isn't a re-implementation of everything, it's a re-structuring, > lots of bits of code that are missing can possibly be lifted from the > existing working implementation. Not the x86 backend but the raw backend. You might not need all of the features because you are always going through another interrupt controller but that doesn't mean they shouldn't be there. Michael has at least agreed to look in that direction so I'm hoping my changes remove some of the difficulty for him. > If we followed that "only do incrementental changes" rule all the time, > imagine in what state would be our USB stack today since we couldn't > have dropped in Linus replacement one ... Well even that was a partial following of that rule because you didn't rewrite the rest of the kernel at the same time, to better support usb. I do agree that there are instances where a complete rewrite is the best path. In this case I don't see a reasonable case for not reusing what is there. Nor do I see the level of care being put into the problem that would cause me to trust a rewrite. I have a huge number of little technical problems with the proposed code, and see absolutely no overriding virtue in it. Especially when the worst of the problems with msi.c can be easily fixed, as demonstrated by my patchset. Eric ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-28 23:26 ` Eric W. Biederman @ 2007-01-28 23:37 ` David Miller 2007-01-29 5:18 ` Eric W. Biederman 2007-01-29 1:33 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 35+ messages in thread From: David Miller @ 2007-01-28 23:37 UTC (permalink / raw) To: ebiederm Cc: benh, jeff, greg, tony.luck, grundler, mingo, linux-kernel, kyle, linuxppc-dev, brice, shaohua.li, linux-pci From: ebiederm@xmission.com (Eric W. Biederman) Date: Sun, 28 Jan 2007 16:26:44 -0700 > Yes. In general the mainline linux kernel does not support certain > classes of stupidity. TCP offload engines, firmware drivers for > hardware we care about, a fixed ABI to binary only modules, etc. > It is the responsibility of the OS to setup MSI so we do it, not > the firmware so we do it. I absolutely disagree with you Eric, and I think you're being rediculious. If the hypervisor doesn't control the MSI PCI config space register writes, this allows the device to spam PCI devices which belong to other domains. It's a freakin' reasonable design trade off decision, get over it! :-) Yes it can be done at the hardware level, and many hypervisor based systems do that, but it's not the one-and-only true way to implment inter-domain protection behind a single PCI host controller. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-28 23:37 ` David Miller @ 2007-01-29 5:18 ` Eric W. Biederman 2007-01-29 5:25 ` David Miller 0 siblings, 1 reply; 35+ messages in thread From: Eric W. Biederman @ 2007-01-29 5:18 UTC (permalink / raw) To: David Miller Cc: benh, jeff, greg, tony.luck, grundler, mingo, linux-kernel, kyle, linuxppc-dev, brice, shaohua.li, linux-pci David Miller <davem@davemloft.net> writes: > From: ebiederm@xmission.com (Eric W. Biederman) > Date: Sun, 28 Jan 2007 16:26:44 -0700 > >> Yes. In general the mainline linux kernel does not support certain >> classes of stupidity. TCP offload engines, firmware drivers for >> hardware we care about, a fixed ABI to binary only modules, etc. >> It is the responsibility of the OS to setup MSI so we do it, not >> the firmware so we do it. > > I absolutely disagree with you Eric, and I think you're being > rediculious. > > If the hypervisor doesn't control the MSI PCI config space > register writes, this allows the device to spam PCI devices > which belong to other domains. > > It's a freakin' reasonable design trade off decision, get over > it! :-) I completely agree with you in the case you have described, it does mean that the hypervisor needs to trust all of the MSI capable hardware in the system but it if that is the best your hardware can support it is a reasonable trade-off. With the MSI-X registers in a random part of some memory mapped bar and not guaranteed to be page aligned, things are more difficult to isolate purely in a software based hypervisor. > Yes it can be done at the hardware level, and many hypervisor > based systems do that, but it's not the one-and-only true > way to implment inter-domain protection behind a single > PCI host controller. The reason I consider the case crazy is that every example I have been given is where the hardware is doing the filtering above the PCI device. So the hypervisor has no need to filter the pci config traffic or to write to the msi config registers for us. Yet the defined hypervisor interface is. Given the reduction in flexibility of an interface where the hypervisor writes to the config registers for the OS as compared to an interface where the hypervisor provides a destination for MSI messages from a particular device upon request, I think it is silly to design an interface when you full hardware support to act like an interface built for a hypervisor that had to do everything in software. Regardless of my opinion on the sanity of the hypervisor architects. I have not seen anything that indicates it will be hard to support the hypervisor doing everything or most of everything for us, so I see no valid technical objection to it. Nor have I ever. So I have no problem with additional patches in that direction. Eric ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-29 5:18 ` Eric W. Biederman @ 2007-01-29 5:25 ` David Miller 2007-01-29 5:58 ` Eric W. Biederman 2007-01-29 6:05 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 35+ messages in thread From: David Miller @ 2007-01-29 5:25 UTC (permalink / raw) To: ebiederm Cc: benh, jeff, greg, tony.luck, grundler, mingo, linux-kernel, kyle, linuxppc-dev, brice, shaohua.li, linux-pci From: ebiederm@xmission.com (Eric W. Biederman) Date: Sun, 28 Jan 2007 22:18:59 -0700 > Regardless of my opinion on the sanity of the hypervisor architects. > I have not seen anything that indicates it will be hard to support > the hypervisor doing everything or most of everything for us, so > I see no valid technical objection to it. Nor have I ever. > > So I have no problem with additional patches in that direction. Ok, that's great to hear. I know your bi-directional approach isn't exactly what Ben wants but he can support his machines with it. Maybe after some time we can agree to move from that more towards the totally abstracted scheme. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-29 5:25 ` David Miller @ 2007-01-29 5:58 ` Eric W. Biederman 2007-01-29 6:05 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 35+ messages in thread From: Eric W. Biederman @ 2007-01-29 5:58 UTC (permalink / raw) To: David Miller Cc: benh, jeff, greg, tony.luck, grundler, mingo, linux-kernel, kyle, linuxppc-dev, brice, shaohua.li, linux-pci David Miller <davem@davemloft.net> writes: > From: ebiederm@xmission.com (Eric W. Biederman) > Date: Sun, 28 Jan 2007 22:18:59 -0700 > >> Regardless of my opinion on the sanity of the hypervisor architects. >> I have not seen anything that indicates it will be hard to support >> the hypervisor doing everything or most of everything for us, so >> I see no valid technical objection to it. Nor have I ever. >> >> So I have no problem with additional patches in that direction. > > Ok, that's great to hear. > > I know your bi-directional approach isn't exactly what Ben > wants but he can support his machines with it. Maybe after > some time we can agree to move from that more towards the > totally abstracted scheme. Moving farther has been my intention the entire time, even while writing those patches. I'm just not prepared to do it in one giant patch where bug hunting becomes impossible. I think I have moved msi.c to the point it won't be a horror to work with, so we can start seriously looking at what it will take to support hypervisors that do this. I don't believe there is anything generic we can do in the general hypervisor case, so we need a way for the architecture code in the case where it is inappropriate to write directly to the msi and msi-x registers to have a completely different implementation of: pci_enable_msi, pci_disable_msi, pci_enable_msix, psi_disable_msix, and whatever other driver interface bits we have in there. One small step at a time and we should get there soon. Eric ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-29 5:25 ` David Miller 2007-01-29 5:58 ` Eric W. Biederman @ 2007-01-29 6:05 ` Benjamin Herrenschmidt 2007-01-29 8:28 ` Eric W. Biederman 2007-01-29 9:03 ` Eric W. Biederman 1 sibling, 2 replies; 35+ messages in thread From: Benjamin Herrenschmidt @ 2007-01-29 6:05 UTC (permalink / raw) To: David Miller Cc: ebiederm, jeff, greg, tony.luck, grundler, mingo, linux-kernel, kyle, linuxppc-dev, brice, shaohua.li, linux-pci On Sun, 2007-01-28 at 21:25 -0800, David Miller wrote: > From: ebiederm@xmission.com (Eric W. Biederman) > Date: Sun, 28 Jan 2007 22:18:59 -0700 > > > Regardless of my opinion on the sanity of the hypervisor architects. > > I have not seen anything that indicates it will be hard to support > > the hypervisor doing everything or most of everything for us, so > > I see no valid technical objection to it. Nor have I ever. > > > > So I have no problem with additional patches in that direction. > > Ok, that's great to hear. > > I know your bi-directional approach isn't exactly what Ben > wants but he can support his machines with it. Maybe after > some time we can agree to move from that more towards the > totally abstracted scheme. It can support my machines without HV with trivial changes I reckon: I need an ops struct to indirect eric's 2 remaining arch hooks (setup/teardown) but that can be done inline within asm-powerpc. I need to double check of course and probably actually port the MPIC backend and possibly go write the Cell Axon one while at it to verify everything is allright, but the base design seems sound enough. For the ones with HV (RTAS stuff), we still need to agree on how to approach it. We can either: Option 1 -------- Do a hook -above- Eric stuff, by having the toplevel APIs themselves be arch hooks that can either go toward the RTAS implementation or toward Eric's code. That is, eric code would define those (pick better names if you are good at it): pci_generic_enable_msi pci_generic_disable_msi pci_generic_enable_msix pci_generic_disable_msix pci_generic_save_msi_state pci_generic_restore_msi_state Then we can have asm-i386/msi.h & friends do something like #define pci_enable_msi pci_generic_enable_msi #define pci_disable_msi pci_generic_disable_msi etc... And we can have asm-powerpc/msi.h hook then via ppc_md: static inline int pci_enable_msi(xxx...) { return ppc_md.pci_enable_msi(xxx...); } etc... (ppc_md is our per-platform global hook structure filled at boot when we discover on what machine type we are running on) so that pSeries can use it's own RTAS callbacks, and others can just re-hook those to Eric's code. Option 2 -------- That is to make Eric's code itself cope with the HV case. I'm a bit at loss right now as how precisely to do it. I need to spend more time staring at the code after Eric latest patches rather than the patches themselves I suppose :-) (Eric, they don't apply out of the box on current git, they are against -mm ?). Some of the main issues here, more/less following the order in which Eric code calls things: - The number of vectors for MSI-X is obtained from config space (at least for sanity checking the requested argument). On RTAS, it should come from an OF property (we are really not supposed to go read the config space even if we can). I -suppose- we can survive for now with just reading it, but we might well run into trouble with some "special" devices shared accross partitions or if the IBM magic bridges themselves ever start sending MSI-X on their own (unlikely but who knows...). Michael's code handled that by having a callback ->check() do the sanity checking of the nvec, and then just use the nvec passed in as an argument once it's sane. So for that I would propose adding an arch_check_msi(pdev, type, nvec) or something like that. Note the biggest issue at this point anyway. - The real big one: For MSI-X, Eric's code tries to "hide" the fact that those are MSI-X by allocating the msi-x entry array, then iterating through them calling arch_setup_msi_irq() for each of them. For that to work for us, it would need to be different, possibly pre-allocating the array, and having -one- call taking an array and a nvec. That's one of the reasons why I liked Michael's approach as instead of making MSI-X look like MSI, it made MSI look like MSI-X by passing a 1 entry array in the MSI case. Both approaches can probably be made to handle multiple MSIs if we ever want to handle them. The same issue is present for teardown of course. - We need HV hooks for suspend/resume at one point. Nothing urgent there as our HV machines don't do suspend/resume just yet :-) But if we ever implement something like suspend-to-disk, they'll definitely need something as we are likely to get different vectors back from the firmware so we need to re-map them to the same linux IRQ numbers. I need to have a second look at Eric's code after I manage to find the right combination of kernel for his patches to apply to check if I missed anything important. Cheers, Ben. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-29 6:05 ` Benjamin Herrenschmidt @ 2007-01-29 8:28 ` Eric W. Biederman 2007-01-29 9:03 ` Eric W. Biederman 1 sibling, 0 replies; 35+ messages in thread From: Eric W. Biederman @ 2007-01-29 8:28 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: David Miller, ebiederm, jeff, greg, tony.luck, grundler, mingo, linux-kernel, kyle, linuxppc-dev, brice, shaohua.li, linux-pci > > That is to make Eric's code itself cope with the HV case. I'm a bit at > loss right now as how precisely to do it. I need to spend more time > staring at the code after Eric latest patches rather than the patches > themselves I suppose :-) (Eric, they don't apply out of the box on > current git, they are against -mm ?). Current git + gregkh-pci (Which has a couple of Michaels patches). With current git the only problem should be context around msi_lookup_irq which changes between the two. But in this case the context around an entire function being deleted doesn't matter. Eric ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-29 6:05 ` Benjamin Herrenschmidt 2007-01-29 8:28 ` Eric W. Biederman @ 2007-01-29 9:03 ` Eric W. Biederman 2007-01-29 10:11 ` Michael Ellerman 2007-01-29 20:22 ` Benjamin Herrenschmidt 1 sibling, 2 replies; 35+ messages in thread From: Eric W. Biederman @ 2007-01-29 9:03 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: David Miller, jeff, greg, tony.luck, grundler, mingo, linux-kernel, kyle, linuxppc-dev, brice, shaohua.li, linux-pci Benjamin Herrenschmidt <benh@kernel.crashing.org> writes: > On Sun, 2007-01-28 at 21:25 -0800, David Miller wrote: >> From: ebiederm@xmission.com (Eric W. Biederman) >> Date: Sun, 28 Jan 2007 22:18:59 -0700 >> >> > Regardless of my opinion on the sanity of the hypervisor architects. >> > I have not seen anything that indicates it will be hard to support >> > the hypervisor doing everything or most of everything for us, so >> > I see no valid technical objection to it. Nor have I ever. >> > >> > So I have no problem with additional patches in that direction. >> >> Ok, that's great to hear. >> >> I know your bi-directional approach isn't exactly what Ben >> wants but he can support his machines with it. Maybe after >> some time we can agree to move from that more towards the >> totally abstracted scheme. > > It can support my machines without HV with trivial changes I reckon: I > need an ops struct to indirect eric's 2 remaining arch hooks > (setup/teardown) but that can be done inline within asm-powerpc. I need > to double check of course and probably actually port the MPIC backend > and possibly go write the Cell Axon one while at it to verify everything > is allright, but the base design seems sound enough. > > For the ones with HV (RTAS stuff), we still need to agree on how to > approach it. We can either: > > Option 1 > -------- > > Do a hook -above- Eric stuff, by having the toplevel APIs themselves be > arch hooks that can either go toward the RTAS implementation or toward > Eric's code. That is, eric code would define those (pick better names if > you are good at it): > > pci_generic_enable_msi > pci_generic_disable_msi > pci_generic_enable_msix > pci_generic_disable_msix > pci_generic_save_msi_state > pci_generic_restore_msi_state > > Then we can have asm-i386/msi.h & friends do something like > > #define pci_enable_msi pci_generic_enable_msi > #define pci_disable_msi pci_generic_disable_msi > etc... > > And we can have asm-powerpc/msi.h hook then via ppc_md: > > static inline int pci_enable_msi(xxx...) > { > return ppc_md.pci_enable_msi(xxx...); > } > etc... > > (ppc_md is our per-platform global hook structure filled at boot when we > discover on what machine type we are running on) so that pSeries can use > it's own RTAS callbacks, and others can just re-hook those to Eric's > code. This is the most straight forward and handles machines with really weird msi setups, so I lean in this direction. The question is there anything at all we can do generically? I can't see a case where ppc_md would not wind up with the hooks that decide if it is a hypervisor or not. Even if we came up with a better set of functions you need to hook. > Option 2 > -------- > > That is to make Eric's code itself cope with the HV case. I'm a bit at > loss right now as how precisely to do it. I need to spend more time > staring at the code after Eric latest patches rather than the patches > themselves I suppose :-) (Eric, they don't apply out of the box on > current git, they are against -mm ?). > > Some of the main issues here, more/less following the order in which > Eric code calls things: > > - The number of vectors for MSI-X is obtained from config space (at > least for sanity checking the requested argument). On RTAS, it should > come from an OF property (we are really not supposed to go read the > config space even if we can). I -suppose- we can survive for now with > just reading it, but we might well run into trouble with some "special" > devices shared accross partitions or if the IBM magic bridges themselves > ever start sending MSI-X on their own (unlikely but who knows...). > Michael's code handled that by having a callback ->check() do the sanity > checking of the nvec, and then just use the nvec passed in as an > argument once it's sane. Ok. I think I get the point of check. I believe I need to look at your code a little more and see what you are doing to see if there is anything generic worth doing, that we can always do outside of architecture code no matter how much of the job the Hypervisor wants to do for us. I'd hate to hit a different Hypervisor that did something close but not quite the same and have the code fail then. So definitely avoiding touching pci config space at all in the calls seems to make a lot of sense. This includes avoiding pci_find_capability right? Off the top of my head the only things we can do generically are some data structure things and flags like dev->msi_enabled or dev->msix_enabled. Anyway have a nice night and more in the morning. Eric ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-29 9:03 ` Eric W. Biederman @ 2007-01-29 10:11 ` Michael Ellerman 2007-01-29 20:32 ` Benjamin Herrenschmidt 2007-01-29 23:29 ` Paul Mackerras 2007-01-29 20:22 ` Benjamin Herrenschmidt 1 sibling, 2 replies; 35+ messages in thread From: Michael Ellerman @ 2007-01-29 10:11 UTC (permalink / raw) To: Eric W. Biederman Cc: Benjamin Herrenschmidt, David Miller, jeff, greg, tony.luck, grundler, mingo, linux-kernel, kyle, linuxppc-dev, brice, shaohua.li, linux-pci [-- Attachment #1: Type: text/plain, Size: 6444 bytes --] On Mon, 2007-01-29 at 02:03 -0700, Eric W. Biederman wrote: > Benjamin Herrenschmidt <benh@kernel.crashing.org> writes: > > > On Sun, 2007-01-28 at 21:25 -0800, David Miller wrote: > >> From: ebiederm@xmission.com (Eric W. Biederman) > >> Date: Sun, 28 Jan 2007 22:18:59 -0700 > >> > >> > Regardless of my opinion on the sanity of the hypervisor architects. > >> > I have not seen anything that indicates it will be hard to support > >> > the hypervisor doing everything or most of everything for us, so > >> > I see no valid technical objection to it. Nor have I ever. > >> > > >> > So I have no problem with additional patches in that direction. > >> > >> Ok, that's great to hear. > >> > >> I know your bi-directional approach isn't exactly what Ben > >> wants but he can support his machines with it. Maybe after > >> some time we can agree to move from that more towards the > >> totally abstracted scheme. > > > > It can support my machines without HV with trivial changes I reckon: I > > need an ops struct to indirect eric's 2 remaining arch hooks > > (setup/teardown) but that can be done inline within asm-powerpc. I need > > to double check of course and probably actually port the MPIC backend > > and possibly go write the Cell Axon one while at it to verify everything > > is allright, but the base design seems sound enough. > > > > For the ones with HV (RTAS stuff), we still need to agree on how to > > approach it. We can either: > > > > Option 1 > > -------- > > > > Do a hook -above- Eric stuff, by having the toplevel APIs themselves be > > arch hooks that can either go toward the RTAS implementation or toward > > Eric's code. That is, eric code would define those (pick better names if > > you are good at it): > > > > pci_generic_enable_msi > > pci_generic_disable_msi > > pci_generic_enable_msix > > pci_generic_disable_msix > > pci_generic_save_msi_state > > pci_generic_restore_msi_state > > > > Then we can have asm-i386/msi.h & friends do something like > > > > #define pci_enable_msi pci_generic_enable_msi > > #define pci_disable_msi pci_generic_disable_msi > > etc... > > > > And we can have asm-powerpc/msi.h hook then via ppc_md: > > > > static inline int pci_enable_msi(xxx...) > > { > > return ppc_md.pci_enable_msi(xxx...); > > } > > etc... > > > > (ppc_md is our per-platform global hook structure filled at boot when we > > discover on what machine type we are running on) so that pSeries can use > > it's own RTAS callbacks, and others can just re-hook those to Eric's > > code. > > This is the most straight forward and handles machines with really > weird msi setups, so I lean in this direction. > > The question is there anything at all we can do generically? > > I can't see a case where ppc_md would not wind up with the hooks > that decide if it is a hypervisor or not. Even if we came up > with a better set of functions you need to hook. > > > Option 2 > > -------- > > > > That is to make Eric's code itself cope with the HV case. I'm a bit at > > loss right now as how precisely to do it. I need to spend more time > > staring at the code after Eric latest patches rather than the patches > > themselves I suppose :-) (Eric, they don't apply out of the box on > > current git, they are against -mm ?). > > > > Some of the main issues here, more/less following the order in which > > Eric code calls things: > > > > - The number of vectors for MSI-X is obtained from config space (at > > least for sanity checking the requested argument). On RTAS, it should > > come from an OF property (we are really not supposed to go read the > > config space even if we can). I -suppose- we can survive for now with > > just reading it, but we might well run into trouble with some "special" > > devices shared accross partitions or if the IBM magic bridges themselves > > ever start sending MSI-X on their own (unlikely but who knows...). > > Michael's code handled that by having a callback ->check() do the sanity > > checking of the nvec, and then just use the nvec passed in as an > > argument once it's sane. > > Ok. I think I get the point of check. I believe I need to look at your > code a little more and see what you are doing to see if there is anything > generic worth doing, that we can always do outside of architecture code > no matter how much of the job the Hypervisor wants to do for us. > > I'd hate to hit a different Hypervisor that did something close but > not quite the same and have the code fail then. So definitely > avoiding touching pci config space at all in the calls seems to make a > lot of sense. This includes avoiding pci_find_capability right? You can read config space, but it's not clear to me if the HV is allowed to filter it and hide things. It's also possible that the device supports MSI, but for some reason the HV doesn't allow it on that device etc. so you really have to ask the HV if it's enabled. So pci_find_cap() shouldn't crash or anything, but it may lie to you. > Off the top of my head the only things we can do generically are > some data structure things and flags like dev->msi_enabled or > dev->msix_enabled. It would be good to have a common data structure if possible. My thinking was that most of the information is per pci_dev, so that's where I put it. I realise the Intel code stores some info that's per-irq, but most of it is per-device. I hadn't got anywhere near coding it, but my vague idea was to add a arch_data (or whatever) pointer to my msi_info struct, which would allow backends to stash stuff. I think the pci_intx() calls can be in the core. Munging dev->irq could be in the core, assuming it's left in some known location by the code. On the other hand we might want to decide it's a bad idea altogether. One thing I did like about my code, is that pci_enable_msi() and pci_enable_msix() are just small wrappers around generic_enable_msi() - which does all the work, and is the same regardless of whether it's an MSI or MSI-X. Although that's facilitated by the type arg which you don't like. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-29 10:11 ` Michael Ellerman @ 2007-01-29 20:32 ` Benjamin Herrenschmidt 2007-01-29 23:29 ` Paul Mackerras 1 sibling, 0 replies; 35+ messages in thread From: Benjamin Herrenschmidt @ 2007-01-29 20:32 UTC (permalink / raw) To: michael Cc: Eric W. Biederman, David Miller, jeff, greg, tony.luck, grundler, mingo, linux-kernel, kyle, linuxppc-dev, brice, shaohua.li, linux-pci > You can read config space, but it's not clear to me if the HV is allowed > to filter it and hide things. I've seen it do it for example with EADS bridges. I haven't seen doing it with devices (other than hiding entire functions) but I wouldn't exclude it... > It's also possible that the device > supports MSI, but for some reason the HV doesn't allow it on that device > etc. so you really have to ask the HV if it's enabled. So pci_find_cap() > shouldn't crash or anything, but it may lie to you. Yup. > One thing I did like about my code, is that pci_enable_msi() and > pci_enable_msix() are just small wrappers around generic_enable_msi() - > which does all the work, and is the same regardless of whether it's an > MSI or MSI-X. Although that's facilitated by the type arg which you > don't like. Part of the reason is you make MSI look like MSI-X (a vector of 1 entry) while Eric does the opposite. Ben. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-29 10:11 ` Michael Ellerman 2007-01-29 20:32 ` Benjamin Herrenschmidt @ 2007-01-29 23:29 ` Paul Mackerras 2007-01-29 23:40 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 35+ messages in thread From: Paul Mackerras @ 2007-01-29 23:29 UTC (permalink / raw) To: michael Cc: Eric W. Biederman, tony.luck, grundler, jeff, linux-kernel, kyle, linuxppc-dev, linux-pci, brice, greg, shaohua.li, mingo, David Miller Michael Ellerman writes: > You can read config space, but it's not clear to me if the HV is allowed > to filter it and hide things. It's also possible that the device It appears that the HV does not prevent us from reading or writing any config space registers for functions that are assigned to us. > supports MSI, but for some reason the HV doesn't allow it on that device > etc. so you really have to ask the HV if it's enabled. So pci_find_cap() > shouldn't crash or anything, but it may lie to you. It's possible that the device can do MSI(X), but that using MSI(X) requires other platform resources (e.g. interrupt source numbers) and there are none free. I believe the platform guarantees a minimum number of MSI(X) interrupts per function, but a pci_enable_msix() call may not be able to give the driver as many MSI-X interrupts as it is requesting even if the function can handle that many. Paul. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-29 23:29 ` Paul Mackerras @ 2007-01-29 23:40 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 35+ messages in thread From: Benjamin Herrenschmidt @ 2007-01-29 23:40 UTC (permalink / raw) To: Paul Mackerras Cc: michael, tony.luck, grundler, jeff, David Miller, greg, linux-kernel, kyle, linuxppc-dev, Eric W. Biederman, shaohua.li, linux-pci, mingo, brice > It's possible that the device can do MSI(X), but that using MSI(X) > requires other platform resources (e.g. interrupt source numbers) and > there are none free. I believe the platform guarantees a minimum > number of MSI(X) interrupts per function, but a pci_enable_msix() call > may not be able to give the driver as many MSI-X interrupts as it is > requesting even if the function can handle that many. However, the ibm,req#msi(-x) properties contain the number as requested by the device, and thus I expect them to be identical to the config space value. So if you are confident enough that our HV won't play any tricks there in the future, reading the config space is as good as hooking that check() callback, though it might not be vs. some other HV for some other platform that might be more strict. We cannot know in advance how much max the HV will give us without actually trying ibm,change-msi and see the result code for it unfortunately. Ben. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-29 9:03 ` Eric W. Biederman 2007-01-29 10:11 ` Michael Ellerman @ 2007-01-29 20:22 ` Benjamin Herrenschmidt 2007-01-29 23:05 ` Paul Mackerras 1 sibling, 1 reply; 35+ messages in thread From: Benjamin Herrenschmidt @ 2007-01-29 20:22 UTC (permalink / raw) To: Eric W. Biederman Cc: David Miller, jeff, greg, tony.luck, grundler, mingo, linux-kernel, kyle, linuxppc-dev, brice, shaohua.li, linux-pci > This is the most straight forward and handles machines with really > weird msi setups, so I lean in this direction. > > The question is there anything at all we can do generically? > > I can't see a case where ppc_md would not wind up with the hooks > that decide if it is a hypervisor or not. Even if we came up > with a better set of functions you need to hook. Sure, but with Michael's approach, the only hook was get_msi_ops(pdev) Anyway, there isn't -that- much that can be done generically in the HV case. Mostly some argument sanity checking, the logic for saving & restoring pdev->irq for MSIs, that sort of thing. > Ok. I think I get the point of check. I believe I need to look at your > code a little more and see what you are doing to see if there is anything > generic worth doing, that we can always do outside of architecture code > no matter how much of the job the Hypervisor wants to do for us. I understand. > I'd hate to hit a different Hypervisor that did something close but > not quite the same and have the code fail then. So definitely > avoiding touching pci config space at all in the calls seems to make a > lot of sense. This includes avoiding pci_find_capability right? Quite possibly yes. I'm pretty sure it will work on IBM HV but we aren't really supposed to use it... > Off the top of my head the only things we can do generically are > some data structure things and flags like dev->msi_enabled or > dev->msix_enabled. That and the saving & restoring of pdev->irq. That is not very much. > Anyway have a nice night and more in the morning. Ben. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-29 20:22 ` Benjamin Herrenschmidt @ 2007-01-29 23:05 ` Paul Mackerras 2007-01-30 19:32 ` Segher Boessenkool 0 siblings, 1 reply; 35+ messages in thread From: Paul Mackerras @ 2007-01-29 23:05 UTC (permalink / raw) To: Eric W. Biederman, Benjamin Herrenschmidt Cc: tony.luck, grundler, jeff, greg, linux-kernel, kyle, linuxppc-dev, linux-pci, brice, shaohua.li, mingo, David Miller Benjamin Herrenschmidt writes: > > I'd hate to hit a different Hypervisor that did something close but > > not quite the same and have the code fail then. So definitely > > avoiding touching pci config space at all in the calls seems to make a > > lot of sense. This includes avoiding pci_find_capability right? > > Quite possibly yes. I'm pretty sure it will work on IBM HV but we aren't > really supposed to use it... Actually, I don't know of any reason why we can't use pci_find_capability. We are supposed to avoid trying to touch config space of devices (in fact, functions) that aren't assigned to our partition, but we're not talking about that here. I just got an answer from the hypervisor architects. It turns out that the hardware _does_ prevent the device from sending MSI messages to another partition. The OS _can_ write whatever it likes to the MSI address and data registers. It can potentially lose interrupts (or, I expect, get the device isolated by EEH) but it can't disrupt another partition. I think the reason why the hypervisor call writes the values straight into the MSI/MSI-X registers in the device is (a) that's convenient for AIX, since it saves it from immediately having to do more calls into the hypervisor to write those values to the device, and (b) there are some ABI complications in returning a lot of values, so the device registers provide a convenient place to return those values. So it would be possible, although gross, to do the hypervisor call, read the values from config space and return them to the generic code, then let the generic code write them to config space for us. :P The remaining point of difference then seems to be that for MSI-X, we really want to know up-front how many interrupts the device driver is asking for, rather than having a series of alloc requests dribble in one at a time. Regards, Paul. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-29 23:05 ` Paul Mackerras @ 2007-01-30 19:32 ` Segher Boessenkool 0 siblings, 0 replies; 35+ messages in thread From: Segher Boessenkool @ 2007-01-30 19:32 UTC (permalink / raw) To: Paul Mackerras Cc: Eric W. Biederman, Benjamin Herrenschmidt, tony.luck, grundler, jeff, David Miller, greg, linux-kernel, kyle, linuxppc-dev, brice, shaohua.li, linux-pci, mingo > I just got an answer from the hypervisor architects. It turns out > that the hardware _does_ prevent the device from sending MSI messages > to another partition. The OS _can_ write whatever it likes to the MSI > address and data registers. It can potentially lose interrupts (or, I > expect, get the device isolated by EEH) but it can't disrupt another > partition. The OS however has to write the values the HV wants to the device, or things won't work -- so the HV can just as well do it itself. Also, pulling all the work into the HV makes for a cleaner, more generic design (who knows what hardware will show up within the next few years, the HV interface had better be prepared). Segher ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-28 23:26 ` Eric W. Biederman 2007-01-28 23:37 ` David Miller @ 2007-01-29 1:33 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 35+ messages in thread From: Benjamin Herrenschmidt @ 2007-01-29 1:33 UTC (permalink / raw) To: Eric W. Biederman Cc: Jeff Garzik, Greg Kroah-Hartman, Tony Luck, Grant Grundler, Ingo Molnar, linux-kernel, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, linux-pci, David S. Miller > To be clear I see this as 2 distinct layers of code. enable/disable > that talks directly to the hardware, and the helpers of enable/disable > that allocate the irq. I base this on the fact that I only need the > alloc/free when I am exclusively working with real hardware. We need the alloc/free in all cases, wether we are talking to real HW or hypervisor. Alloc free is what allocates linux virtual irq numbers (or irq_desc's if your prefer) and what sets up the irq_desc->irq_chip to the appropriate thing for MSIs on that machines. Thus it's really the required step for everybody. The thing you seem to be mixing up is allocating of linux virtual irqs (picking an irq desc) and allocating of a HW vectors on your platformn (which happens to be the same pretty much on x86 nowdays no ? That is, they have the same numbering domain don't they ?). That is, while in the HV case, we don't allocate HW vectors (the HV does it for us), we still need to allocate linux irqs, setup the irq desc, and hook them up. > > You seem to absolutely want to get the HV case to go throuh the same > > code path as the "raw" case, and that will not happen. > > Yes I do. Because that is the only sane approach for a HV to use. BUT WE DON'T HAVE A CHOICE ON WHAT APPROACH THE HV USES !!!! pfff... Isn't that clear enough ? IBM will not change their HV interfaces becasue you don't like them, and I doubt sun will neither and despite you disagreeing on that, we -do- have to support them (hell, that's what I'm paid for among other things :-) It would be nice if we could dictate all HV and hardware vendors around how we think they should work and what interfaces they should provide, I suppose M$ can do that with Windows, but unfortunately, we aren't in a position to do that. > And yes we need an irq allocator to call the HV to setup the upstream > reception of the msi message. Not sure I completely parse the above. > However I don't think it will be to hard to support your HV once we get > the real hardware supported. I just refuse to consider it before we have > figured out what makes sense in the context where we have to do everything. Hrmph.... > > .../... (irq operations) > > > >> These because they are per irq make sense as per bus operations unless > >> you have a good architecture definition like x86 has. Roughly those > >> operations are what we currently have except the current operations > >> are a little simpler and easier to deal with for the architecture > >> code. > > > > Oh ? How so ? (easier/simpler ?) > > I don't take a type parameter, and I don't take a vector. All of > that work is done in the generic code. Well, so basically, the main difference is that we make MSI looks like MSI-X by providing an alloc/free abstraction that takes an array in all cases and you make MSI-X look like MSI by working one interrupt at a time. Your case avoids a for () loop in the backend, at the cost of being fundamentally incompatible with our HV approach (and possibly others like sparc64). We do pass the MSI vs. MSI-X because it's handy for the HV case to pass it along to the firmware, though it doesn't have to be used, and indeed, in the "raw" case, we don't use it. > > This is indeed a lower level hook to be used by "raw" enable/disable. An > > other approach would be to remove it, have each backend have it's own > > enable/disable that obtains the address/data and calls into a helper to > > program them. This would indeed be a little bit nicer in a layering > > perspective. But it adds a bit more code to each backend, so we kept > > things closer to the way they used to be. I don't have a firm reason not > > to change it however, I need talk to Michael in case he has more good > > reasons to keep it that way around. > > The current code in the kernel already is structured that way because > we have to reprogram the msi message on each irq migration. Not using > a helper to write the message would be a noticeable change and require > a fair amount of code rewriting on the currently supported > architectures. We never proposed not to use a helper to write back the message. We are missing such a helper in the current implementation, true, but that doesn't mean we are opposed to havign it, on the contrary. However, I don't think your implementation is much cleaner :-) The thing is that Michael's implementation completely avoids having any knowledge of the specifics of enabling/disabling MSI's or MSI-X's in the top level core code. The main difference after the alloc/free case is the enable/disable case: You do something like that: Toplevel calls the backend "setup" for each MSI or MSI-X, which itself then calls back into a helper to actually write the message, that helper doing then a switch/case on MSI vs. MSI-X based on infos in the msi desc. Then, you go back to the toplevel which goes whack the config space to atually do the global enabling of MSIs or MSI-X. Well, I don't think that from a layering perspective, that is much nicer. Your toplevel is a mix of high level interface to the backend and low level code specific to the "raw" implementation. In fact, I preferred the way it was done previously in that area in the sense that if you decide to have the "raw" implementation indeed be the "default" one, then move it at the top level and call some hook to obtain the address/value pair for each MSI. That doesn't preclude having the low level write_msi_message() function still be exported for use by the set_affinity callback. Michael's approach is similar than the above except that instead of having the raw implementation at the toplevel, it hooks is via enable/disable/setup_msi_msg. > Yes. In general the mainline linux kernel does not support certain > classes of stupidity. > TCP offload engines, firmware drivers for > hardware we care about, a fixed ABI to binary only modules, etc. > It is the responsibility of the OS to setup MSI so we do it, not > the firmware so we do it. > > Not supporting stupid things that are hard to support encourages other > people not to be so silly, especially when linux still works on the > hardware when that silly feature isn't supported. Not supporting IBM HV because of those idealistic reasons means not supporting a whole range of IBM machines in linux since LSIs are optional on PCI-E. It's not just a performance difference. A whole set of hardware will -not- work on those machines because somebody has an ideal view of the world (heh, that's funny, that same person actively works on x86 support, damn, that's something less than ideal :-) I think that's a bit of a lame attitude (without wanting to be insulting). The same way we can't dictate HW vendors how to do their stuff (we try to encourage them ,we try to teach them, but once the HW is out and people use it, we do also try to actually support it). So yes, we try to "fix" some of our HV stuff when we think it's too much off the hook (for example, initial interfaces didn't allow to differenciate MSI from MSI-X at all, we got that changed) but there's a limit on our influence on these things (heh, they also have to support other operating systems) and we can't just say "won't support you" when the interfaces don't please us. > For similar reasons we don't support more than 1 irq with a plain MSI > capability. I never understood why we had this stupid limitation in our API. It would have been easy enough to do an API that can support it, as long as we properly define that the platform is allowed to give you less than what you asked. > It is hard Not really... Heh, in fact, with those "stupid" hypervisor interfaces, it's actually very easy :-) But even in the raw case. Really not that hard. Easier than MSI-X in many ways. > we can't do it on most hardware I've seen quite a few cards who say they do more than 1 MSI and the host hardware shouldn't matter in that area. > and anyone who wants more than 1 irq should just implement MSI-X and everyone > will be able to use it, on any hardware. Sure, anyone should just implement their hardware the way the linux folks tell them to do, too bad HW vendors don't worship us as gods and don't take our rules as god send laws ... > Part of the reason to not support a messed up HV interface if it hard > is that a HV is just software. Which means the incremental cost to > fix it is roughly the same as fixing the linux kernel, and it puts > the burden on the people doing stupid things not on the rest of us > forever more. The comparison between > 1 MSI and HV is bad here. Supporting only 1 MSI actually still allows the HW to work. Not supporting the HV (and thus not supporting MSIs on those machines) does not when you start hitting hardware that doesn't do LSI (which is allowed by spec and is starting to appear, some IB cards for example don't do LSI). > No. I have spent time fixing what is there, and made it work. I see > implementations proposed that don't handle cases I have fixed, and I > don't see anything that resembles a simple migration path for i386, > x86_64 and ia64. Which is part of what annoys me when I am told > the ops work for everything. They potentially do, and the easy migration path is mostly to use the existing code as a HV-type backend for x86, and then incrementally fix our generic "raw" helpers and move bits of the x86 / old-generic-code to it... That's also a nice incremental approach... > As for the code not working important parts of the code (like MSI-X) > don't even work on ppc. > The strength of my opposition is largely > shaped by the number of people wearing rose colored glasses and > ignoring the problems, and missing huge details. Well, which is why I'd like to have a more constructive discussion on how to address those rather than outright dismissing the approach. You are using the fact that Michael's implementation isn't feature complete as an argument to dismiss the entire approach. In a way, you are commiting a layering violation in your argument :-) However, we can't get that resolved if we still don't agree on the veric basic premises of the direction we are taking. We need to agree on some of the fundamentals (like having alloc/free take an array or be per-interrupt) or agree to disagree in which case we have no choice on our side to "finish" Michael's implementation to do all those bits it's missing and propose it as an alternate since the main one will not handle our needs. > Given that we have been talking about things since before OLS I would > have expected the ppc code to be a little farther along. We have been delayed / side tracked with other things. Shit happens. > Not the x86 backend but the raw backend. You might not need all of > the features because you are always going through another interrupt > controller but that doesn't mean they shouldn't be there. I never disagreed with that. I always said we should have most of the missing bits added to the raw backend for x86. > Michael has at least agreed to look in that direction so I'm hoping > my changes remove some of the difficulty for him. Some do, some makes it more difficult. The way you removed the alloc/free vs. setup/teardown distinction and made the whole thing per-interrupt makes things more difficult for us. > Nor do I see the level of care being put into the problem that would > cause me to trust a rewrite. I have a huge number of little technical > problems with the proposed code, and see absolutely no overriding > virtue in it. Especially when the worst of the problems with msi.c > can be easily fixed, as demonstrated by my patchset. Can be fixed in a way that is by design incompatible with what we need. How should I phrase this so you understand that's not an option for us ? Ben. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-28 22:09 ` Benjamin Herrenschmidt 2007-01-28 23:26 ` Eric W. Biederman @ 2007-02-01 4:29 ` Greg KH 1 sibling, 0 replies; 35+ messages in thread From: Greg KH @ 2007-02-01 4:29 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Eric W. Biederman, Jeff Garzik, Tony Luck, Grant Grundler, Ingo Molnar, linux-kernel, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, linux-pci, David S. Miller On Mon, Jan 29, 2007 at 09:09:14AM +1100, Benjamin Herrenschmidt wrote: > > If we followed that "only do incrementental changes" rule all the time, > imagine in what state would be our USB stack today since we couldn't > have dropped in Linus replacement one ... Bad example, that is not what happened at all. There was not an in-kernel USB stack when Linus wrote his. Inaky had his all-singing-all-dancing stack outside of the tree, and no one was really helping out with it. Only when Linus added his code to mainline did we all jump on it and _incrementally_ improve it to what we have today. So, in a way, you just proved that we need to do this in an incremental fashion, which is what I was also saying all along :) thanks, greg k-h ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-28 21:20 ` Eric W. Biederman 2007-01-28 21:26 ` Ingo Molnar 2007-01-28 22:09 ` Benjamin Herrenschmidt @ 2007-01-28 23:44 ` David Miller 2 siblings, 0 replies; 35+ messages in thread From: David Miller @ 2007-01-28 23:44 UTC (permalink / raw) To: ebiederm Cc: jeff, benh, greg, tony.luck, grundler, mingo, linux-kernel, kyle, linuxppc-dev, brice, shaohua.li, linux-pci From: ebiederm@xmission.com (Eric W. Biederman) Date: Sun, 28 Jan 2007 14:20:12 -0700 > I see people pushing ridiculous interfaces like the RTAS hypervisor > interface at me, and saying we must support running firmware drivers > in the msi code. This is not what's going on. The hypervisor does the PCI config space programming on the device to setup the MSI so that it can be done in a controlled manner and such that the device cannot ever be configured by one domain to shoot MSI packets over at devices which belong to another domain. It's that simple. That's absolutely reasonable, and is I believe what you'll see the sparc64 hypervisor(s) all needing as well. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-28 20:47 ` Jeff Garzik 2007-01-28 21:20 ` Eric W. Biederman @ 2007-01-28 22:11 ` Eric W. Biederman 2007-01-28 23:42 ` David Miller 2 siblings, 0 replies; 35+ messages in thread From: Eric W. Biederman @ 2007-01-28 22:11 UTC (permalink / raw) To: Jeff Garzik Cc: Benjamin Herrenschmidt, Greg Kroah-Hartman, Tony Luck, Grant Grundler, Ingo Molnar, linux-kernel, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, linux-pci, David S. Miller Jeff Garzik <jeff@garzik.org> writes: > I think the high-level ops approach makes more sense. It's more future proof, > in addition to covering all existing implementations. To be precise in Michaels implementation one of the parameters passed is a type parameter so that the architecture has to know about each different type of msi implementation. In my implementation that field does not exist, because it is unnecessary. So as long as the message on the bus is a msi message my implementation can be adapted to support it without any architecture changes. Being future proof is about getting the abstraction correct, and exposing those details that matter, and removing those detail that don't. It is a minor nit, not a fundamental flaw in the operations concept. But one of the reasons I am opposed to throwing out the current working code. Evolutionary change ensures that things only the code remembers don't get left behind. I guess that is the other part of the discussion that shows up here is, as long as the change is an evolutionary change from what is working today. I don't have any fundamental problems with it, but I am completely against a revolutionary change. Meanwhile because Michael has proposed operations my position has been perceived as against operations. While I have a lot of technical nits to pick with the Michaels operations approach, I'm not fundamentally against it. I just don't want to loose the information that only the code remembers. Most of my technical objections have been formed by looking at what the code does today, looking at what Michaels code is doing and seeing details he missed. If we just start with the current code base and fix it the whole approach is much easier. Anyway last I heard Michael was working on starting with the current msi.c and making his patch set work, and I am hoping that my work will make that patchset cleaner, and easier to do. Even if we do conflict at the moment :) Eric ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-28 20:47 ` Jeff Garzik 2007-01-28 21:20 ` Eric W. Biederman 2007-01-28 22:11 ` Eric W. Biederman @ 2007-01-28 23:42 ` David Miller 2 siblings, 0 replies; 35+ messages in thread From: David Miller @ 2007-01-28 23:42 UTC (permalink / raw) To: jeff Cc: benh, ebiederm, greg, tony.luck, grundler, mingo, linux-kernel, kyle, linuxppc-dev, brice, shaohua.li, linux-pci From: Jeff Garzik <jeff@garzik.org> Date: Sun, 28 Jan 2007 15:47:24 -0500 > I think the high-level ops approach makes more sense. It's more future > proof, in addition to covering all existing implementations. I totally agree with this. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/6] MSI portability cleanups 2007-01-28 20:23 ` [PATCH 0/6] MSI portability cleanups Benjamin Herrenschmidt 2007-01-28 20:47 ` Jeff Garzik @ 2007-01-28 21:34 ` Eric W. Biederman 1 sibling, 0 replies; 35+ messages in thread From: Eric W. Biederman @ 2007-01-28 21:34 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Greg Kroah-Hartman, Tony Luck, Grant Grundler, Ingo Molnar, linux-kernel, Kyle McMartin, linuxppc-dev, Brice Goglin, shaohua.li, linux-pci, David S. Miller Benjamin Herrenschmidt <benh@kernel.crashing.org> writes: >> The other big change is that I added a field to irq_desc to point >> at the msi_desc. This removes the conflicts with the existing pointer >> fields and makes the irq -> msi_desc mapping useable outside of msi.c > > I'm not even sure we would have needed that with Michael's mecanism in > fact. One other reason why I prefer it. > > Basically, backends like MPIC etc... don't need it. The irq chip > operations are normal MPIC operations and don't need to know they are > done on an MSI nor what MSI etc... and thus we don't need it. Same with > RTAS. If you get rid of the bass ackwards setup_msi_msg operation they do, so you can support at least one write_msi_msg call. > On the other hand, x86 needs it, but then, x86 uses it's own MSI > specific irq_chip, in which case it can use irq_desc->chip_data as long > as it does it within the backend. Most of the uses are within msi.c as the code is currently structured which means you can't use it that way. > So I may have missed a case where a given backend might need both that > irq -> msi_desc mapping -and- use irq_desc->chip_data for other things, > but that's one thing I was hoping we could avoid with Michael's code. That is where we are today. Find a way to remove the code that uses it and it can go away. >> The only architecture problem that isn't solvable in this context is >> the problem of supporting the crazy hypervisor on the ppc RTAS, which >> asks us to drive the hardware but does not give us access to the >> hardware registers. > > So you are saying that we should use your model while admitting that it > can't solve our problems... My approach can solve your problems with a few tweaks just like Michaels approach would have needed to solve mine. > I really don't understand why you seem so totally opposed to Michael's > approach which definitely looks to me like the sane thing to do. Note > that in the end, Michael's approach isn't -that- different from yours, > just a bit more abstracted. 1) Because every one tells me it is the greatest thing since sliced bread, and when I look it simply doesn't work, and my feeling would be it would be a complete retesting effort of all currently supported architectures to make Michaels code work. 2) Because it was scrap and replace, which is a horrible way to deal with a problem when we have 3 architectures working already. Honestly I think Michael and I can get along but all of the cheer leaders seem to be exacerbating the situation. I do agree Michael's approach isn't that different than mine and I think we can converge on a single implementation. To a large extent that is what my patchset is about. Moving the current code far enough it is usable, and a reasonable basis for more work. I don't write the current code but since I touched it and started cleaning it up I seem to be stuck with it. So I will be happy to take care of it until we get a version that all architectures can use. Eric ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2007-02-01 4:30 UTC | newest] Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <1169714047.65693.647693675533.qpush@cradle> 2007-01-28 19:40 ` [PATCH 0/6] MSI portability cleanups Eric W. Biederman 2007-01-28 19:42 ` [PATCH 1/6] msi: Kill msi_lookup_irq Eric W. Biederman 2007-01-28 19:44 ` [PATCH 2/6] msi: Remove msi_lock Eric W. Biederman 2007-01-28 19:45 ` [PATCH 3/6] msi: Fix msi_remove_pci_irq_vectors Eric W. Biederman 2007-01-28 19:47 ` [PATCH 4/6] msi: Remove attach_msi_entry Eric W. Biederman 2007-01-28 19:52 ` [PATCH 5/6] msi: Kill the msi_desc array Eric W. Biederman 2007-01-28 19:56 ` [PATCH 6/6] msi: Make MSI useable more architectures Eric W. Biederman 2007-01-28 22:01 ` [PATCH 1/6] msi: Kill msi_lookup_irq Paul Mackerras 2007-01-28 22:18 ` Eric W. Biederman 2007-01-28 20:23 ` [PATCH 0/6] MSI portability cleanups Benjamin Herrenschmidt 2007-01-28 20:47 ` Jeff Garzik 2007-01-28 21:20 ` Eric W. Biederman 2007-01-28 21:26 ` Ingo Molnar 2007-01-28 22:09 ` Benjamin Herrenschmidt 2007-01-28 23:26 ` Eric W. Biederman 2007-01-28 23:37 ` David Miller 2007-01-29 5:18 ` Eric W. Biederman 2007-01-29 5:25 ` David Miller 2007-01-29 5:58 ` Eric W. Biederman 2007-01-29 6:05 ` Benjamin Herrenschmidt 2007-01-29 8:28 ` Eric W. Biederman 2007-01-29 9:03 ` Eric W. Biederman 2007-01-29 10:11 ` Michael Ellerman 2007-01-29 20:32 ` Benjamin Herrenschmidt 2007-01-29 23:29 ` Paul Mackerras 2007-01-29 23:40 ` Benjamin Herrenschmidt 2007-01-29 20:22 ` Benjamin Herrenschmidt 2007-01-29 23:05 ` Paul Mackerras 2007-01-30 19:32 ` Segher Boessenkool 2007-01-29 1:33 ` Benjamin Herrenschmidt 2007-02-01 4:29 ` Greg KH 2007-01-28 23:44 ` David Miller 2007-01-28 22:11 ` Eric W. Biederman 2007-01-28 23:42 ` David Miller 2007-01-28 21:34 ` Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).