LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org,
Zwane Mwaikambo <zwane@infradead.org>,
Ashok Raj <ashok.raj@intel.com>, Ingo Molnar <mingo@elte.hu>,
"Lu, Yinghai" <yinghai.lu@amd.com>,
Natalie Protasevich <protasnb@gmail.com>, Andi Kleen <ak@suse.de>,
"Siddha, Suresh B" <suresh.b.siddha@intel.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH] x86_64 irq: Document what works and why on ioapics.
Date: Fri, 23 Feb 2007 05:01:38 -0700 [thread overview]
Message-ID: <m1abz5t74t.fsf_-_@ebiederm.dsl.xmission.com> (raw)
In-Reply-To: <m1hctdt7ub.fsf_-_@ebiederm.dsl.xmission.com> (Eric W. Biederman's message of "Fri, 23 Feb 2007 04:46:20 -0700")
After writing this up and sending out the email it occured to me this
information should be kept someplace a little more permanent, so the
next person who cares won't have to get a huge pile of test machines
and test to understand what doesn't work.
A bunch of this is in my other changelog entries in the patches I
just posted but not all of it.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
Documentation/x86_64/IO-APIC-what-works.txt | 109 +++++++++++++++++++++++++++
1 files changed, 109 insertions(+), 0 deletions(-)
create mode 100644 Documentation/x86_64/IO-APIC-what-works.txt
diff --git a/Documentation/x86_64/IO-APIC-what-works.txt b/Documentation/x86_64/IO-APIC-what-works.txt
new file mode 100644
index 0000000..40fa61f
--- /dev/null
+++ b/Documentation/x86_64/IO-APIC-what-works.txt
@@ -0,0 +1,109 @@
+23 Feb 2007
+
+Ok. This is just an email to summarize my findings after investigating
+the ioapic programming.
+
+The ioapics on the E75xx chipset do have issues if you attempt to
+reprogramming them outside of the irq handler. I have on several
+instances caused the state machine to get stuck such that an
+individual ioapic entry was no longer capable of delivering
+interrupts. I suspect the remote IRR bit was set stuck on such that
+switch the irq to edge triggered and back to level triggered would not
+clear it but I did not confirm this. I just know that I was switching
+the irq to between level and edge triggered with the irq masked
+and the irq did not fire.
+
+
+The ioapics on the AMD 8xxx chipset do have issues if you attempt
+to reprogram them outside of the irq handler. I would up with
+remote IRR set and never clearing. But by temporarily switching
+the irq to edge triggered while it was masked I could clear
+this condition.
+
+I could not hit verifiable bugs in the ioapics on the Nforce4
+chipset. It's amazing one part of that chipset that I can't find
+issues with.
+
+
+
+I did find an algorithm that will work successfully for migrating
+IRQs in process context if you have an ioapic that will follow pci
+ordering rules. In particulars the properties that the algorithm
+depend on are reads guaranteeing that outstanding writes are flushed,
+and in this context irqs in flight are considered writes. I have
+assumed that to devices outside of the cpu asic the cpu and the local
+apic appear as the same device.
+
+The algorithm was:
+- Be running with interrupts enabled in process context.
+- Mask the ioapic.
+- Read the ioapic to flush outstanding reads to the local apic.
+- Read the local apic to flush outstanding irqs to be send the cpu.
+
+- Now that all of the irqs have been delivered and the irq is masked
+ that irq is finally quiescent.
+
+- With the irq quiescent it is safe to reprogram interrupt controller
+ and the irq reception data structures.
+
+There were a lot more details but that was the essence.
+
+What I discovered was that except on the nforce chipset masking the
+ioapic and then issue a read did not behave as if the interrupts were
+flushed to the local apic.
+
+I did not look close enough to tell if local apics suffered from this
+issue. With local apics at least a read was necessary before you
+could guarantee the local apic would deliver pending irqs. A work
+around on the local apics is to simply issue a low priority interrupt
+as an IPI and wait for it to be processed. This guarantees that all
+higher priority interrupts have been flushed from the apic, and that
+the local apic has processed interrupts.
+
+For ioapics because they cannot be stimulated to send any irq by
+stimulation from the cpu side not similar work around was possible.
+
+
+
+** Conclusions.
+
+*IRQs must be reprogramed in interrupt context.
+
+The result of this is investigation is that I am convinced we need
+to perform the irq migration activities in interrupt context although
+I am not convinced it is completely safe. I suspect multiple irqs
+firing closely enough to each other may hit the same issues as
+migrating irqs from process context. However the odds are on our
+side, when we are in irq context.
+
+The reasoning for this is simply that.
+- Before we reprogram a level triggered irq it's remote irr bit
+ must be cleared by the irq being acknowledged before the can be
+ safely reprogrammed.
+
+- There is no generally effective way short of receiving an additional
+ irq to ensure that the irq handler has run. Polling the ioapics
+ remote irr bit does not work.
+
+
+* The CPU hotplug code is currently very buggy.
+
+Irq migration in the cpu hotplug case is a serious problem. If we can
+only safely migrate irqs from interrupt context and we cannot control
+when those interrupts fire, then we cannot bound the amount of time it
+will take to migrate the irqs away from a cpu. The current cpu
+hotplug code currently calls chip->set_affinity directly which is
+wrong, as it does not take the necessary locks, and it does not
+attempt to delay execution until we are in process context.
+
+* Only an additional irq can signal the completion of an irq movement.
+
+The attempt to rebuild the irq migration code from first principles
+did bear some fruit. I asked the question: "When is it safe to tear
+down the data structures for irq movement?". The only answer I have
+is when I have received an irq provably from after the irq was
+reprogrammed. This is because the only way I can reliably synchronize
+with irq delivery from an apic is to receive an additional irq.
+
+Currently this is a problem both for cpu hotplug on x86_64 and i386
+and for general irq migration on x86_64.
--
1.5.0.g53756
next prev parent reply other threads:[~2007-02-23 12:02 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <200701221116.13154.luigi.genoni@pirelli.com>
2007-01-22 17:14 ` System crash after "No irq handler for vector" linux 2.6.19 Eric W. Biederman
[not found] ` <200701231051.32945.luigi.genoni@pirelli.com>
2007-01-23 12:18 ` Eric W. Biederman
[not found] ` <Pine.LNX.4.64.0701232052330.32111@baldios.it.pirelli.com>
2007-01-31 8:39 ` Eric W. Biederman
[not found] ` <200701311549.22512.luigi.genoni@pirelli.com>
2007-02-01 5:56 ` [PATCH] x86_64: Survive having no irq mapping for a vector Eric W. Biederman
2007-02-01 5:59 ` System crash after "No irq handler for vector" linux 2.6.19 Eric W. Biederman
2007-02-01 7:20 ` Eric W. Biederman
[not found] ` <200702021848.55921.luigi.genoni@pirelli.com>
2007-02-02 18:02 ` Eric W. Biederman
[not found] ` <200702021905.39922.luigi.genoni@pirelli.com>
2007-02-02 18:32 ` Eric W. Biederman
2007-02-03 0:31 ` [PATCH 1/2] x86_64 irq: Simplfy __assign_irq_vector Eric W. Biederman
2007-02-03 0:35 ` [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration Eric W. Biederman
2007-02-03 1:05 ` Andrew Morton
2007-02-03 1:39 ` Eric W. Biederman
2007-02-03 2:01 ` Andrew Morton
2007-02-03 7:32 ` Arjan van de Ven
2007-02-03 7:55 ` Eric W. Biederman
2007-02-03 14:31 ` l.genoni
2007-02-03 10:01 ` Andi Kleen
2007-02-03 10:22 ` Eric W. Biederman
2007-02-03 10:26 ` Andi Kleen
2007-02-06 7:36 ` Ingo Molnar
2007-02-06 8:57 ` Eric W. Biederman
[not found] ` <200702061012.25910.luigi.genoni@pirelli.com>
2007-02-06 22:05 ` Eric W. Biederman
2007-02-06 22:16 ` Eric W. Biederman
2007-02-06 22:25 ` Ingo Molnar
2007-02-07 2:33 ` Eric W. Biederman
2007-02-08 11:48 ` Eric W. Biederman
2007-02-08 20:19 ` Eric W. Biederman
2007-02-09 6:40 ` Eric W. Biederman
2007-02-10 23:52 ` What are the real ioapic rte programming constraints? Eric W. Biederman
2007-02-11 5:57 ` Zwane Mwaikambo
2007-02-11 10:20 ` Eric W. Biederman
2007-02-11 16:16 ` Zwane Mwaikambo
2007-02-11 22:01 ` Eric W. Biederman
2007-02-12 1:05 ` Zwane Mwaikambo
2007-02-12 4:51 ` Eric W. Biederman
2007-02-23 10:51 ` Conclusions from my investigation about ioapic programming Eric W. Biederman
2007-02-23 11:10 ` [PATCH 0/14] x86_64 irq related fixes and cleanups Eric W. Biederman
2007-02-23 11:11 ` [PATCH 01/14] x86_64 irq: Simplfy __assign_irq_vector Eric W. Biederman
2007-02-23 11:13 ` [PATCH 02/14] irq: Remove set_native_irq_info Eric W. Biederman
2007-02-23 11:15 ` [PATCH 03/14] x86_64 irq: Kill declaration of removed array, interrupt Eric W. Biederman
2007-02-23 11:16 ` [PATCH 04/14] x86_64 irq: Remove the unused vector parameter from ioapic_register_intr Eric W. Biederman
2007-02-23 11:19 ` [PATCH 05/14] x86_64 irq: Refactor setup_IO_APIC_irq Eric W. Biederman
2007-02-23 11:20 ` [PATCH 06/14] x86_64 irq: Simplfiy the set_affinity logic Eric W. Biederman
2007-02-23 11:23 ` [PATCH 07/14] x86_64 irq: In __DO_ACTION perform the FINAL action for every entry Eric W. Biederman
2007-02-23 11:26 ` [PATCH 08/14] x86_64 irq: Use NR_IRQS not NR_IRQ_VECTORS Eric W. Biederman
2007-02-23 11:32 ` [PATCH 09/14] x86_64 irq: Begin consolidating per_irq data in structures Eric W. Biederman
2007-02-23 11:35 ` [PATCH 10/14] x86_64 irq: Simplify assign_irq_vector's arguments Eric W. Biederman
2007-02-23 11:36 ` [PATCH 11/14] x86_64 irq: Remove unnecessary irq 0 setup Eric W. Biederman
2007-02-23 11:38 ` [PATCH 12/14] x86_64 irq: Add constants for the reserved IRQ vectors Eric W. Biederman
2007-02-23 11:40 ` [PATCH 13/14] x86_64 irq: Safely cleanup an irq after moving it Eric W. Biederman
2007-02-25 11:53 ` Mika Penttilä
2007-02-25 12:09 ` Eric W. Biederman
2007-02-23 11:46 ` [PATCH 14/14] genirq: Mask irqs when migrating them Eric W. Biederman
2007-02-23 12:01 ` Eric W. Biederman [this message]
2007-02-24 2:06 ` Siddha, Suresh B
2007-02-27 20:26 ` Andrew Morton
2007-02-27 20:41 ` Eric W. Biederman
2007-02-25 10:43 ` [PATCH 12/14] x86_64 irq: Add constants for the reserved IRQ vectors Pavel Machek
2007-02-25 11:15 ` Eric W. Biederman
2007-02-25 19:48 ` Pavel Machek
2007-02-25 21:01 ` Eric W. Biederman
2007-02-25 21:13 ` Pavel Machek
2007-02-23 16:48 ` Conclusions from my investigation about ioapic programming Jeff V. Merkey
2007-02-23 18:10 ` Eric W. Biederman
2007-02-23 17:48 ` Jeff V. Merkey
2007-02-24 4:05 ` Eric W. Biederman
2007-02-24 5:44 ` Jeffrey V. Merkey
2007-02-23 17:48 ` Jeff V. Merkey
[not found] ` <32209efe0702111212j77f5011xe2430cb13c13686@mail.gmail.com>
2007-02-11 21:36 ` What are the real ioapic rte programming constraints? Eric W. Biederman
2007-02-03 9:50 ` [PATCH 1/2] x86_64 irq: Simplfy __assign_irq_vector Andi Kleen
2007-02-03 0:40 ` System crash after "No irq handler for vector" linux 2.6.19 Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1abz5t74t.fsf_-_@ebiederm.dsl.xmission.com \
--to=ebiederm@xmission.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=ashok.raj@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=protasnb@gmail.com \
--cc=suresh.b.siddha@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=yinghai.lu@amd.com \
--cc=zwane@infradead.org \
--subject='Re: [PATCH] x86_64 irq: Document what works and why on ioapics.' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).