LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] disable CPU side GART accesses
@ 2008-10-15 21:48 Bob Montgomery
  2008-10-15 23:40 ` Linus Torvalds
  2008-10-15 23:48 ` Ingo Molnar
  0 siblings, 2 replies; 17+ messages in thread
From: Bob Montgomery @ 2008-10-15 21:48 UTC (permalink / raw)
  To: linux-kernel; +Cc: vojtech, Linus Torvalds, chandru

This patch prevents improper access of the GART aperture from kdump
kernels running on AMD systems.  

Symptoms of the problem include hangs, spurious restarts, and MCE
(Machine Check Exception) panics in some AMD Opteron systems that
enable the GART IOMMU and access /proc/vmcore or /dev/oldmem from a
kdump kernel.  Note that the GART IOMMU will not be enabled on systems
with less than 4 GB of RAM, so symptoms will not appear.  This problem
has been reproduced on Family 10H Quad-Core AMD Opteron systems.

This patch changes the initialization of the GART to set the DISGARTCPU
bit in the GART Aperture Control Register (AMD64_GARTAPERTURECTL).
Setting the bit prevents requests from the CPUs from accessing the
GART.  In other words, CPU memory accesses to the aperture address
range will not cause the GART to perform an address translation.
The aperture area is currently being unmapped at the kernel level
with set_memory_np() in gart_iommu_init to prevent accesses from the
CPU, but that kernel level unmapping is not in effect in the kexec'd
kdump kernel.  By disabling the CPU-side accesses within the GART,
which does persist through the kexec of the kdump kernel, the kdump
kernel is prevented from interacting with the GART during accesses
to the dump memory areas which include the address range of the GART
aperture.  Although the patch can be applied to the kdump kernel,
it is not exercised there because the kdump kernel doesn't attempt
to initialize the GART, since it typically runs in less than 4 GB
of memory.


Signed-off-by: Bob Montgomery <bob.montgomery@hp.com>


--- linux-2.6.27/include/asm-x86/gart.h	2008-10-13 16:36:34.000000000 -0600
+++ linux-2.6.27-fix/include/asm-x86/gart.h	2008-10-14 10:37:32.000000000 -0600
@@ -44,7 +44,8 @@ static inline void enable_gart_translati
         /* Enable GART translation for this hammer. */
         pci_read_config_dword(dev, AMD64_GARTAPERTURECTL, &ctl);
         ctl |= GARTEN;
-        ctl &= ~(DISGARTCPU | DISGARTIO);
+        ctl |= DISGARTCPU;
+        ctl &= ~(DISGARTIO);
         pci_write_config_dword(dev, AMD64_GARTAPERTURECTL, ctl);
 }
 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-10-15 21:48 [PATCH] disable CPU side GART accesses Bob Montgomery
@ 2008-10-15 23:40 ` Linus Torvalds
  2008-10-16 19:17   ` Bob Montgomery
  2008-10-15 23:48 ` Ingo Molnar
  1 sibling, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2008-10-15 23:40 UTC (permalink / raw)
  To: Bob Montgomery; +Cc: linux-kernel, vojtech, chandru



On Wed, 15 Oct 2008, Bob Montgomery wrote:
> 
> This patch changes the initialization of the GART to set the DISGARTCPU
> bit in the GART Aperture Control Register (AMD64_GARTAPERTURECTL).
> Setting the bit prevents requests from the CPUs from accessing the
> GART.  In other words, CPU memory accesses to the aperture address
> range will not cause the GART to perform an address translation.
> The aperture area is currently being unmapped at the kernel level
> with set_memory_np() in gart_iommu_init to prevent accesses from the
> CPU [...]

Would this allow us to get rid of that particular hackup code sequence 
entirely? Or do we still need them for other chip versions etc?

Also, the whole iommu/gart thing seems to have a lot of people who have 
worked on it, are the right people cc'd? Pavel seems to have touched the 
code last, but it seems to be originally done by Andi and then with 
touches by DaveJ. 

I get the feeling that the people cc'd are kdump people, not iommu/gart 
people, which is a bit sad.

		Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-10-15 21:48 [PATCH] disable CPU side GART accesses Bob Montgomery
  2008-10-15 23:40 ` Linus Torvalds
@ 2008-10-15 23:48 ` Ingo Molnar
  2008-10-16  0:22   ` Yinghai Lu
  2008-10-27 22:42   ` Bob Montgomery
  1 sibling, 2 replies; 17+ messages in thread
From: Ingo Molnar @ 2008-10-15 23:48 UTC (permalink / raw)
  To: Bob Montgomery
  Cc: linux-kernel, vojtech, Linus Torvalds, chandru, Joerg Roedel,
	FUJITA Tomonori, Yinghai Lu, Jesse Barnes, Pavel Machek


(Cc:-ed the GART folks.)

* Bob Montgomery <bob.montgomery@hp.com> wrote:

> This patch prevents improper access of the GART aperture from kdump 
> kernels running on AMD systems.
> 
> Symptoms of the problem include hangs, spurious restarts, and MCE 
> (Machine Check Exception) panics in some AMD Opteron systems that 
> enable the GART IOMMU and access /proc/vmcore or /dev/oldmem from a 
> kdump kernel.  Note that the GART IOMMU will not be enabled on systems 
> with less than 4 GB of RAM, so symptoms will not appear.  This problem 
> has been reproduced on Family 10H Quad-Core AMD Opteron systems.
> 
> This patch changes the initialization of the GART to set the 
> DISGARTCPU bit in the GART Aperture Control Register 
> (AMD64_GARTAPERTURECTL). Setting the bit prevents requests from the 
> CPUs from accessing the GART.  In other words, CPU memory accesses to 
> the aperture address range will not cause the GART to perform an 
> address translation. The aperture area is currently being unmapped at 
> the kernel level with set_memory_np() in gart_iommu_init to prevent 
> accesses from the CPU, but that kernel level unmapping is not in 
> effect in the kexec'd kdump kernel.  By disabling the CPU-side 
> accesses within the GART, which does persist through the kexec of the 
> kdump kernel, the kdump kernel is prevented from interacting with the 
> GART during accesses to the dump memory areas which include the 
> address range of the GART aperture.  Although the patch can be applied 
> to the kdump kernel, it is not exercised there because the kdump 
> kernel doesn't attempt to initialize the GART, since it typically runs 
> in less than 4 GB of memory.
> 
> Signed-off-by: Bob Montgomery <bob.montgomery@hp.com>
> 
> 
> --- linux-2.6.27/include/asm-x86/gart.h	2008-10-13 16:36:34.000000000 -0600
> +++ linux-2.6.27-fix/include/asm-x86/gart.h	2008-10-14 10:37:32.000000000 -0600
> @@ -44,7 +44,8 @@ static inline void enable_gart_translati
>          /* Enable GART translation for this hammer. */
>          pci_read_config_dword(dev, AMD64_GARTAPERTURECTL, &ctl);
>          ctl |= GARTEN;
> -        ctl &= ~(DISGARTCPU | DISGARTIO);
> +        ctl |= DISGARTCPU;
> +        ctl &= ~(DISGARTIO);
>          pci_write_config_dword(dev, AMD64_GARTAPERTURECTL, ctl);
>  }
>  
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-10-15 23:48 ` Ingo Molnar
@ 2008-10-16  0:22   ` Yinghai Lu
  2008-10-16 17:00     ` Bob Montgomery
  2008-10-27 22:42   ` Bob Montgomery
  1 sibling, 1 reply; 17+ messages in thread
From: Yinghai Lu @ 2008-10-16  0:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Bob Montgomery, linux-kernel, vojtech, Linus Torvalds, chandru,
	Joerg Roedel, FUJITA Tomonori, Jesse Barnes, Pavel Machek

Ingo Molnar wrote:
> (Cc:-ed the GART folks.)
> 
> * Bob Montgomery <bob.montgomery@hp.com> wrote:
> 
>> This patch prevents improper access of the GART aperture from kdump 
>> kernels running on AMD systems.
>>
>> Symptoms of the problem include hangs, spurious restarts, and MCE 
>> (Machine Check Exception) panics in some AMD Opteron systems that 
>> enable the GART IOMMU and access /proc/vmcore or /dev/oldmem from a 
>> kdump kernel.  Note that the GART IOMMU will not be enabled on systems 
>> with less than 4 GB of RAM, so symptoms will not appear.  This problem 
>> has been reproduced on Family 10H Quad-Core AMD Opteron systems.
>>
>> This patch changes the initialization of the GART to set the 
>> DISGARTCPU bit in the GART Aperture Control Register 
>> (AMD64_GARTAPERTURECTL). Setting the bit prevents requests from the 
>> CPUs from accessing the GART.  In other words, CPU memory accesses to 
>> the aperture address range will not cause the GART to perform an 
>> address translation. The aperture area is currently being unmapped at 
>> the kernel level with set_memory_np() in gart_iommu_init to prevent 
>> accesses from the CPU, but that kernel level unmapping is not in 
>> effect in the kexec'd kdump kernel.  By disabling the CPU-side 
>> accesses within the GART, which does persist through the kexec of the 
>> kdump kernel, the kdump kernel is prevented from interacting with the 
>> GART during accesses to the dump memory areas which include the 
>> address range of the GART aperture.  Although the patch can be applied 
>> to the kdump kernel, it is not exercised there because the kdump 
>> kernel doesn't attempt to initialize the GART, since it typically runs 
>> in less than 4 GB of memory.

how about area is not used by IOMMU in GART?

        /*
         * Unmap the IOMMU part of the GART. The alias of the page is
         * always mapped with cache enabled and there is no full cache
         * coherency across the GART remapping. The unmapping avoids
         * automatic prefetches from the CPU allocating cache lines in
         * there. All CPU accesses are done via the direct mapping to
         * the backing memory. The GART address is only used by PCI
         * devices.
         */
        set_memory_np((unsigned long)__va(iommu_bus_base),
                                iommu_size >> PAGE_SHIFT);

the code only set np to the iommu window.

also following patch should fix the problem with kexec/kdump already. that patch is in mainline from 2.6.25-rc1.

YH

commit aaf230424204864e2833dcc1da23e2cb0b9f39cd
Author: Yinghai Lu <Yinghai.Lu@Sun.COM>
Date:   Wed Jan 30 13:33:09 2008 +0100

    x86: disable the GART early, 64-bit

    For K8 system: 4G RAM with memory hole remapping enabled, or more than
    4G RAM installed.

    when try to use kexec second kernel, and the first doesn't include
    gart_shutdown. the second kernel could have different aper position than
    the first kernel. and second kernel could use that hole as RAM that is
    still used by GART set by the first kernel. esp. when try to kexec
    2.6.24 with sparse mem enable from previous kernel (from RHEL 5 or SLES
    10). the new kernel will use aper by GART (set by first kernel) for
    vmemmap. and after new kernel setting one new GART. the position will be
    real RAM. the _mapcount set is lost.

    Bad page state in process 'swapper'
    page:ffffe2000e600020 flags:0x0000000000000000 mapping:0000000000000000 mapcount:1 count:0
    Trying to fix it up, but a reboot is needed
    Backtrace:
    Pid: 0, comm: swapper Not tainted 2.6.24-rc7-smp-gcdf71a10-dirty #13

    Call Trace:
     [<ffffffff8026401f>] bad_page+0x63/0x8d
     [<ffffffff80264169>] __free_pages_ok+0x7c/0x2a5
     [<ffffffff80ba75d1>] free_all_bootmem_core+0xd0/0x198
     [<ffffffff80ba3a42>] numa_free_all_bootmem+0x3b/0x76
     [<ffffffff80ba3461>] mem_init+0x3b/0x152
     [<ffffffff80b959d3>] start_kernel+0x236/0x2c2
     [<ffffffff80b9511a>] _sinittext+0x11a/0x121

    and
     [ffffe2000e600000-ffffe2000e7fffff] PMD ->ffff81001c200000 on node 0
    phys addr is : 0x1c200000

    RHEL 5.1 kernel -53 said:
    PCI-DMA: aperture base @ 1c000000 size 65536 KB

    new kernel said:
    Mapping aperture over 65536 KB of RAM @ 3c000000

    So could try to disable that GART if possible.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-10-16  0:22   ` Yinghai Lu
@ 2008-10-16 17:00     ` Bob Montgomery
  2008-10-16 17:43       ` Yinghai Lu
  0 siblings, 1 reply; 17+ messages in thread
From: Bob Montgomery @ 2008-10-16 17:00 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, linux-kernel, vojtech, Linus Torvalds, chandru,
	Joerg Roedel, FUJITA Tomonori, Jesse Barnes, Pavel Machek

On Thu, 2008-10-16 at 00:22 +0000, Yinghai Lu wrote:
> Ingo Molnar wrote:
> > (Cc:-ed the GART folks.)
> >
> > * Bob Montgomery <bob.montgomery@hp.com> wrote:
> >
> >> This patch prevents improper access of the GART aperture from kdump
> >> kernels running on AMD systems.
> >>

> >> This patch changes the initialization of the GART to set the
> >> DISGARTCPU bit in the GART Aperture Control Register
> >> (AMD64_GARTAPERTURECTL). Setting the bit prevents requests from the
> >> CPUs from accessing the GART.  In other words, CPU memory accesses to
> >> the aperture address range will not cause the GART to perform an
> >> address translation. The aperture area is currently being unmapped at
> >> the kernel level with set_memory_np() in gart_iommu_init to prevent
> >> accesses from the CPU, but that kernel level unmapping is not in
> >> effect in the kexec'd kdump kernel.  By disabling the CPU-side
> >> accesses within the GART, which does persist through the kexec of the
> >> kdump kernel, the kdump kernel is prevented from interacting with the
> >> GART during accesses to the dump memory areas which include the
> >> address range of the GART aperture. 


> >> Although the patch can be applied
> >> to the kdump kernel, it is not exercised there because the kdump
> >> kernel doesn't attempt to initialize the GART, since it typically runs
> >> in less than 4 GB of memory.

> 
> how about area is not used by IOMMU in GART?

> the code only set np to the iommu window.

I think you are not seeing the difference between kexec and kdump.  The
kdump kernel runs out of a pre-allocated area of memory that was taken
away from the original kernel, for example:
  01000000-08ffffff : Crash kernel

The kdump kernel does not try to re-use any of the original kernel's
memory, it only wants to copy it to a dump file.  The kdump kernel is
running in a small area of memory, so during initialization it ignores
the GART, since it doesn't need IOMMU translation to do IO to its memory
in the Crash kernel area.

The problem occurs when the copy operation reads from the GART aperture
(iommu window) and wakes up the GART translation hardware.  This patch
stops that by telling the GART to ignore addresses that come from the
CPU and to only translate addresses from the IO side. 
> 
> also following patch should fix the problem with kexec/kdump already. that patch is in mainline from 2.6.25-rc1.
> 

This problem was confirmed and then fixed by my patch on a 2.6.27 kernel
and independently on a 2.6.27-rc8 kernel by Chandru.  So it seems that
your patch does not fix this kdump-related problem.

Bob Montgomery

> YH
> 
> commit aaf230424204864e2833dcc1da23e2cb0b9f39cd
> Author: Yinghai Lu <Yinghai.Lu@Sun.COM>
> Date:   Wed Jan 30 13:33:09 2008 +0100
> 
>     x86: disable the GART early, 64-bit
> 
>     For K8 system: 4G RAM with memory hole remapping enabled, or more than
>     4G RAM installed.
> 
>     when try to use kexec second kernel, and the first doesn't include
>     gart_shutdown. the second kernel could have different aper position than
>     the first kernel. and second kernel could use that hole as RAM that is
>     still used by GART set by the first kernel. esp. when try to kexec
>     2.6.24 with sparse mem enable from previous kernel (from RHEL 5 or SLES
>     10). the new kernel will use aper by GART (set by first kernel) for
>     vmemmap. and after new kernel setting one new GART. the position will be
>     real RAM. the _mapcount set is lost.
> 
>     Bad page state in process 'swapper'
>     page:ffffe2000e600020 flags:0x0000000000000000 mapping:0000000000000000 mapcount:1 count:0
>     Trying to fix it up, but a reboot is needed
>     Backtrace:
>     Pid: 0, comm: swapper Not tainted 2.6.24-rc7-smp-gcdf71a10-dirty #13
> 
>     Call Trace:
>      [<ffffffff8026401f>] bad_page+0x63/0x8d
>      [<ffffffff80264169>] __free_pages_ok+0x7c/0x2a5
>      [<ffffffff80ba75d1>] free_all_bootmem_core+0xd0/0x198
>      [<ffffffff80ba3a42>] numa_free_all_bootmem+0x3b/0x76
>      [<ffffffff80ba3461>] mem_init+0x3b/0x152
>      [<ffffffff80b959d3>] start_kernel+0x236/0x2c2
>      [<ffffffff80b9511a>] _sinittext+0x11a/0x121
> 
>     and
>      [ffffe2000e600000-ffffe2000e7fffff] PMD ->ffff81001c200000 on node 0
>     phys addr is : 0x1c200000
> 
>     RHEL 5.1 kernel -53 said:
>     PCI-DMA: aperture base @ 1c000000 size 65536 KB
> 
>     new kernel said:
>     Mapping aperture over 65536 KB of RAM @ 3c000000
> 
>     So could try to disable that GART if possible.
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-10-16 17:00     ` Bob Montgomery
@ 2008-10-16 17:43       ` Yinghai Lu
  2008-10-16 19:26         ` Bob Montgomery
  0 siblings, 1 reply; 17+ messages in thread
From: Yinghai Lu @ 2008-10-16 17:43 UTC (permalink / raw)
  To: bob.montgomery
  Cc: Ingo Molnar, linux-kernel, vojtech, Linus Torvalds, chandru,
	Joerg Roedel, FUJITA Tomonori, Jesse Barnes, Pavel Machek

On Thu, Oct 16, 2008 at 10:00 AM, Bob Montgomery <bob.montgomery@hp.com> wrote:
>
> The problem occurs when the copy operation reads from the GART aperture
> (iommu window) and wakes up the GART translation hardware.  This patch
> stops that by telling the GART to ignore addresses that come from the
> CPU and to only translate addresses from the IO side.

why kdump need to copy those area? those area is supposed to be
reserved in e820 table by BIOS or first kernel.

YH

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-10-15 23:40 ` Linus Torvalds
@ 2008-10-16 19:17   ` Bob Montgomery
  0 siblings, 0 replies; 17+ messages in thread
From: Bob Montgomery @ 2008-10-16 19:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, vojtech, chandru, Joerg Roedel, FUJITA Tomonori,
	Yinghai Lu, Jesse Barnes, Pavel Machek

On Wed, 2008-10-15 at 23:40 +0000, Linus Torvalds wrote:
> 
> On Wed, 15 Oct 2008, Bob Montgomery wrote:
> >
> > This patch changes the initialization of the GART to set the DISGARTCPU
> > bit in the GART Aperture Control Register (AMD64_GARTAPERTURECTL).
> > Setting the bit prevents requests from the CPUs from accessing the
> > GART.  In other words, CPU memory accesses to the aperture address
> > range will not cause the GART to perform an address translation.
> > The aperture area is currently being unmapped at the kernel level
> > with set_memory_np() in gart_iommu_init to prevent accesses from the
> > CPU [...]
> 
> Would this allow us to get rid of that particular hackup code sequence
> entirely? Or do we still need them for other chip versions etc?

Short answer: I don't know.  Here's some of what I don't know enough
about:

The GART aperture is typically overlaid over a real memory area, so it
effectively wastes the 64MB (or whatever) of RAM underneath it.  When
you disable CPU side access in the GART itself, the kernel once again
should "see" that RAM and presumably use it.  But, it wouldn't be
general purpose RAM, because it couldn't be used for DMA (since any
accesses from the IO side would be GART'd off to somewhere else).
Would that make it overly hacky?

It appears to be possible for a BIOS to set up a valid aperture that
does not overlay real memory.   Mine never does, so I get dmesgs like:

[    0.000999] Node 0: aperture @ 20000000 size 32 MB
[    0.000999] Aperture pointing to e820 RAM. Ignoring.
[    0.000999] Your BIOS doesn't leave a aperture memory hole
[    0.000999] Please enable the IOMMU option in the BIOS setup
[    0.000999] This costs you 64 MB of RAM
[    0.000999] Mapping aperture over 65536 KB of RAM @ 20000000

But if it did set up over a hole, would the current code still call
set_memory_np for an address that wasn't RAM in the e820 map?  Would
that be a problem or a NOP?   

> 
> I get the feeling that the people cc'd are kdump people, not iommu/gart
> people, which is a bit sad.

Noted. Thanks to Ingo for the cc's.   Chandru was cc'd because he was
the only other person I knew who had seen the problem, and he tested the
fix first on 2.6.27-rc8.

> 
>                 Linus

Bob Montgomery


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-10-16 17:43       ` Yinghai Lu
@ 2008-10-16 19:26         ` Bob Montgomery
  0 siblings, 0 replies; 17+ messages in thread
From: Bob Montgomery @ 2008-10-16 19:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, linux-kernel, vojtech, Linus Torvalds, chandru,
	Joerg Roedel, FUJITA Tomonori, Jesse Barnes, Pavel Machek

On Thu, 2008-10-16 at 17:43 +0000, Yinghai Lu wrote:
> On Thu, Oct 16, 2008 at 10:00 AM, Bob Montgomery <bob.montgomery@hp.com> wrote:
> >
> > The problem occurs when the copy operation reads from the GART aperture
> > (iommu window) and wakes up the GART translation hardware.  This patch
> > stops that by telling the GART to ignore addresses that come from the
> > CPU and to only translate addresses from the IO side.
> 
> why kdump need to copy those area? those area is supposed to be
> reserved in e820 table by BIOS or first kernel.
> 
> YH

The crashdump analysis tools do not need a copy of this area.  But if a
user tool associated with the kdump operation manages to touch it
through /proc/vmcore or /dev/oldmem, it can crash the kdump kernel.
This seemed like a simple and logical way to prevent that danger.

Bob Montgomery


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-10-15 23:48 ` Ingo Molnar
  2008-10-16  0:22   ` Yinghai Lu
@ 2008-10-27 22:42   ` Bob Montgomery
  2008-10-27 23:06     ` Yinghai Lu
  1 sibling, 1 reply; 17+ messages in thread
From: Bob Montgomery @ 2008-10-27 22:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, vojtech, Linus Torvalds, chandru, Joerg Roedel,
	FUJITA Tomonori, Yinghai Lu, Jesse Barnes, Pavel Machek,
	Andi Kleen

On Wed, 2008-10-15 at 23:48 +0000, Ingo Molnar wrote:
> (Cc:-ed the GART folks.)

Are there any objections to this patch?  The problem has been reproduced
in 2.6.18 and 2.6.27, so existing changes in the gart code have not
addressed it. 

The patch disarms a landmine left behind for the kdump kernel.  You
could leave it there and warn people not to step there, or you can
disarm it so it isn't a danger.

Thank you,
Bob Montgomery
    
> 
> * Bob Montgomery <bob.montgomery@hp.com> wrote:
> 
> > This patch prevents improper access of the GART aperture from kdump
> > kernels running on AMD systems.
> >
> > Symptoms of the problem include hangs, spurious restarts, and MCE
> > (Machine Check Exception) panics in some AMD Opteron systems that
> > enable the GART IOMMU and access /proc/vmcore or /dev/oldmem from a
> > kdump kernel.  Note that the GART IOMMU will not be enabled on systems
> > with less than 4 GB of RAM, so symptoms will not appear.  This problem
> > has been reproduced on Family 10H Quad-Core AMD Opteron systems.
> >
> > This patch changes the initialization of the GART to set the
> > DISGARTCPU bit in the GART Aperture Control Register
> > (AMD64_GARTAPERTURECTL). Setting the bit prevents requests from the
> > CPUs from accessing the GART.  In other words, CPU memory accesses to
> > the aperture address range will not cause the GART to perform an
> > address translation. The aperture area is currently being unmapped at
> > the kernel level with set_memory_np() in gart_iommu_init to prevent
> > accesses from the CPU, but that kernel level unmapping is not in
> > effect in the kexec'd kdump kernel.  By disabling the CPU-side
> > accesses within the GART, which does persist through the kexec of the
> > kdump kernel, the kdump kernel is prevented from interacting with the
> > GART during accesses to the dump memory areas which include the
> > address range of the GART aperture.  Although the patch can be applied
> > to the kdump kernel, it is not exercised there because the kdump
> > kernel doesn't attempt to initialize the GART, since it typically runs
> > in less than 4 GB of memory.
> >
> > Signed-off-by: Bob Montgomery <bob.montgomery@hp.com>
> >
> >
> > --- linux-2.6.27/include/asm-x86/gart.h       2008-10-13 16:36:34.000000000 -0600
> > +++ linux-2.6.27-fix/include/asm-x86/gart.h   2008-10-14 10:37:32.000000000 -0600
> > @@ -44,7 +44,8 @@ static inline void enable_gart_translati
> >          /* Enable GART translation for this hammer. */
> >          pci_read_config_dword(dev, AMD64_GARTAPERTURECTL, &ctl);
> >          ctl |= GARTEN;
> > -        ctl &= ~(DISGARTCPU | DISGARTIO);
> > +        ctl |= DISGARTCPU;
> > +        ctl &= ~(DISGARTIO);
> >          pci_write_config_dword(dev, AMD64_GARTAPERTURECTL, ctl);
> >  }
> >
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-10-27 22:42   ` Bob Montgomery
@ 2008-10-27 23:06     ` Yinghai Lu
  2008-10-29 20:52       ` Bob Montgomery
  0 siblings, 1 reply; 17+ messages in thread
From: Yinghai Lu @ 2008-10-27 23:06 UTC (permalink / raw)
  To: bob.montgomery
  Cc: Ingo Molnar, linux-kernel, vojtech, Linus Torvalds, chandru,
	Joerg Roedel, FUJITA Tomonori, Jesse Barnes, Pavel Machek,
	Andi Kleen

Bob Montgomery wrote:
> On Wed, 2008-10-15 at 23:48 +0000, Ingo Molnar wrote:
>> (Cc:-ed the GART folks.)
> 
> Are there any objections to this patch?  The problem has been reproduced
> in 2.6.18 and 2.6.27, so existing changes in the gart code have not
> addressed it. 

better to have someone with amd system + 8g ram + agp adapter to verify it...

YH

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-10-27 23:06     ` Yinghai Lu
@ 2008-10-29 20:52       ` Bob Montgomery
  2008-10-29 21:24         ` Dave Airlie
  0 siblings, 1 reply; 17+ messages in thread
From: Bob Montgomery @ 2008-10-29 20:52 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, linux-kernel, vojtech, Linus Torvalds, chandru,
	Joerg Roedel, FUJITA Tomonori, Jesse Barnes, Pavel Machek,
	Andi Kleen, Dave Jones

On Mon, 2008-10-27 at 23:06 +0000, Yinghai Lu wrote:
> Bob Montgomery wrote:
> > On Wed, 2008-10-15 at 23:48 +0000, Ingo Molnar wrote:
> >> (Cc:-ed the GART folks.)
> >
> > Are there any objections to this patch?  The problem has been reproduced
> > in 2.6.18 and 2.6.27, so existing changes in the gart code have not
> > addressed it.
> 
> better to have someone with amd system + 8g ram + agp adapter to verify it...
> 
> YH

Good idea.

I don't think HP made a system with opteron and agp, so I haven't found
a system here to test.  But I did see that there have been non-HP
motherboards from several years ago that advertised opteron with agp.
Anyone running that type of system?

The other question is whether AGP graphics on linux for amd64 ever
expected host translation to work, since setting DisGartCpu is turning
off host translation.   

I saw this in the AGP 3.0 spec (hopeful, but inconclusive):

"8. The Core-logic is not required by the AGP specification to translate
   accesses directed to the AGP aperture by a host processor –
   termed host translation. Portable AGP3.0 software should not rely
   upon the existence of host translation."

Any advice from graphics experts?

Bob Montgomery








^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-10-29 20:52       ` Bob Montgomery
@ 2008-10-29 21:24         ` Dave Airlie
  2008-10-29 21:32           ` Dave Jones
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Airlie @ 2008-10-29 21:24 UTC (permalink / raw)
  To: bob.montgomery
  Cc: Yinghai Lu, Ingo Molnar, linux-kernel, vojtech, Linus Torvalds,
	chandru, Joerg Roedel, FUJITA Tomonori, Jesse Barnes,
	Pavel Machek, Andi Kleen, Dave Jones

On Thu, Oct 30, 2008 at 6:52 AM, Bob Montgomery <bob.montgomery@hp.com> wrote:
> On Mon, 2008-10-27 at 23:06 +0000, Yinghai Lu wrote:
>> Bob Montgomery wrote:
>> > On Wed, 2008-10-15 at 23:48 +0000, Ingo Molnar wrote:
>> >> (Cc:-ed the GART folks.)
>> >
>> > Are there any objections to this patch?  The problem has been reproduced
>> > in 2.6.18 and 2.6.27, so existing changes in the gart code have not
>> > addressed it.
>>
>> better to have someone with amd system + 8g ram + agp adapter to verify it...
>>
>> YH
>
> Good idea.
>
> I don't think HP made a system with opteron and agp, so I haven't found
> a system here to test.  But I did see that there have been non-HP
> motherboards from several years ago that advertised opteron with agp.
> Anyone running that type of system?
>
> The other question is whether AGP graphics on linux for amd64 ever
> expected host translation to work, since setting DisGartCpu is turning
> off host translation.

We have an option in the AGP drivers called cant_use_aperture.

This stops the CPU from using the aperture for most DRI things. I
can't confirm this won't regress working systems
though. The whole AMD GART thing scares me, esp if some of the host
chipsets also have an AGP GART.

Dave.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-10-29 21:24         ` Dave Airlie
@ 2008-10-29 21:32           ` Dave Jones
  2008-10-29 21:40             ` Dave Airlie
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Jones @ 2008-10-29 21:32 UTC (permalink / raw)
  To: Dave Airlie
  Cc: bob.montgomery, Yinghai Lu, Ingo Molnar, linux-kernel, vojtech,
	Linus Torvalds, chandru, Joerg Roedel, FUJITA Tomonori,
	Jesse Barnes, Pavel Machek, Andi Kleen

On Thu, Oct 30, 2008 at 07:24:34AM +1000, Dave Airlie wrote:

 > This stops the CPU from using the aperture for most DRI things. I
 > can't confirm this won't regress working systems
 > though. The whole AMD GART thing scares me, esp if some of the host
 > chipsets also have an AGP GART.

The easy cop-out for those in the past has been 'dont support them'.
It's why we removed some K8 chipset PCI IDs from the via driver for eg.
iirc, if we leave them unprogrammed, they're essentially irrelevant.

	Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-10-29 21:32           ` Dave Jones
@ 2008-10-29 21:40             ` Dave Airlie
  2008-11-03 23:36               ` Bob Montgomery
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Airlie @ 2008-10-29 21:40 UTC (permalink / raw)
  To: Dave Jones, Dave Airlie, bob.montgomery, Yinghai Lu, Ingo Molnar,
	linux-kernel, vojtech, Linus Torvalds, chandru, Joerg Roedel,
	FUJITA Tomonori, Jesse Barnes, Pavel Machek, Andi Kleen

On Thu, Oct 30, 2008 at 7:32 AM, Dave Jones <davej@redhat.com> wrote:
> On Thu, Oct 30, 2008 at 07:24:34AM +1000, Dave Airlie wrote:
>
>  > This stops the CPU from using the aperture for most DRI things. I
>  > can't confirm this won't regress working systems
>  > though. The whole AMD GART thing scares me, esp if some of the host
>  > chipsets also have an AGP GART.
>
> The easy cop-out for those in the past has been 'dont support them'.
> It's why we removed some K8 chipset PCI IDs from the via driver for eg.
> iirc, if we leave them unprogrammed, they're essentially irrelevant.
>

I was more going the other way, why use the IOMMU for AGP when it has
other tasks to
do, and we have a host chipset GART.

Granted I've never had an AMD + AGP system to ever care about this.

Dave.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-10-29 21:40             ` Dave Airlie
@ 2008-11-03 23:36               ` Bob Montgomery
  2008-11-03 23:55                 ` Dave Airlie
  0 siblings, 1 reply; 17+ messages in thread
From: Bob Montgomery @ 2008-11-03 23:36 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Dave Jones, Yinghai Lu, Ingo Molnar, linux-kernel, vojtech,
	Linus Torvalds, chandru, Joerg Roedel, FUJITA Tomonori,
	Jesse Barnes, Pavel Machek

On Wed, 2008-10-29 at 21:40 +0000, Dave Airlie wrote:
> On Thu, Oct 30, 2008 at 7:32 AM, Dave Jones <davej@redhat.com> wrote:
> > On Thu, Oct 30, 2008 at 07:24:34AM +1000, Dave Airlie wrote:
> >
> >  > This stops the CPU from using the aperture for most DRI things. I
> >  > can't confirm this won't regress working systems
> >  > though. The whole AMD GART thing scares me, esp if some of the host
> >  > chipsets also have an AGP GART.
> >
> > The easy cop-out for those in the past has been 'dont support them'.
> > It's why we removed some K8 chipset PCI IDs from the via driver for eg.
> > iirc, if we leave them unprogrammed, they're essentially irrelevant.
> >
> 
> I was more going the other way, why use the IOMMU for AGP when it has
> other tasks to
> do, and we have a host chipset GART.
> 
> Granted I've never had an AMD + AGP system to ever care about this.

We're specifically talking about AMD64, and we're not using an IOMMU for
AGP, we're using the AMD64 implementation of the GART for an IOMMU.
The (possible) danger is that some old AMD64 system could also (or
instead) try using the GART for AGP and run into a problem since my
patch wants to disable CPU side access to the aperture, which is fine
when we're using it as an IOMMU.

In drivers/gpu/drm/drm_memory.c:agp_remap(), there are these comments
about the part of the code that deals with "cant_use_aperture":

/*
 * OK, we're mapping AGP space on a chipset/platform on which
 * memory accesses by the CPU do not get remapped by the GART.
 * We fix this by using the kernel's page-table instead (that's
 * probably faster anyhow...).
 */

So that's encouraging.  Now the question is this:  Can I just go into
amd64-agp.c and add ".cant_use_aperture=true" to the agp_bridge_driver
struct?  Who's brave enough to say that will just work? :-)

static const struct agp_bridge_driver amd_8151_driver = {
...

The "cant_use_aperture" paths have possibly
never been tested on amd64 agp systems, but
are in use on these systems:

alpha-agp.c:    .cant_use_aperture      = true,
hp-agp.c:       .cant_use_aperture      = true,
i460-agp.c:     .cant_use_aperture      = true,
parisc-agp.c:   .cant_use_aperture      = true,
sgi-agp.c:      .cant_use_aperture = true,
uninorth-agp.c: .cant_use_aperture      = true,
uninorth-agp.c: .cant_use_aperture      = true,

Thanks for any more enlightenment,
Bob Montgomery








^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-11-03 23:36               ` Bob Montgomery
@ 2008-11-03 23:55                 ` Dave Airlie
  2008-11-19 22:12                   ` Bob Montgomery
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Airlie @ 2008-11-03 23:55 UTC (permalink / raw)
  To: bob.montgomery
  Cc: Dave Jones, Yinghai Lu, Ingo Molnar, linux-kernel, vojtech,
	Linus Torvalds, chandru, Joerg Roedel, FUJITA Tomonori,
	Jesse Barnes, Pavel Machek

On Tue, Nov 4, 2008 at 9:36 AM, Bob Montgomery <bob.montgomery@hp.com> wrote:
> On Wed, 2008-10-29 at 21:40 +0000, Dave Airlie wrote:
>> On Thu, Oct 30, 2008 at 7:32 AM, Dave Jones <davej@redhat.com> wrote:
>> > On Thu, Oct 30, 2008 at 07:24:34AM +1000, Dave Airlie wrote:
>> >
>> >  > This stops the CPU from using the aperture for most DRI things. I
>> >  > can't confirm this won't regress working systems
>> >  > though. The whole AMD GART thing scares me, esp if some of the host
>> >  > chipsets also have an AGP GART.
>> >
>> > The easy cop-out for those in the past has been 'dont support them'.
>> > It's why we removed some K8 chipset PCI IDs from the via driver for eg.
>> > iirc, if we leave them unprogrammed, they're essentially irrelevant.
>> >
>>
>> I was more going the other way, why use the IOMMU for AGP when it has
>> other tasks to
>> do, and we have a host chipset GART.
>>
>> Granted I've never had an AMD + AGP system to ever care about this.
>
> We're specifically talking about AMD64, and we're not using an IOMMU for
> AGP, we're using the AMD64 implementation of the GART for an IOMMU.
> The (possible) danger is that some old AMD64 system could also (or
> instead) try using the GART for AGP and run into a problem since my
> patch wants to disable CPU side access to the aperture, which is fine
> when we're using it as an IOMMU.
>
> In drivers/gpu/drm/drm_memory.c:agp_remap(), there are these comments
> about the part of the code that deals with "cant_use_aperture":
>
> /*
>  * OK, we're mapping AGP space on a chipset/platform on which
>  * memory accesses by the CPU do not get remapped by the GART.
>  * We fix this by using the kernel's page-table instead (that's
>  * probably faster anyhow...).
>  */
>
> So that's encouraging.  Now the question is this:  Can I just go into
> amd64-agp.c and add ".cant_use_aperture=true" to the agp_bridge_driver
> struct?  Who's brave enough to say that will just work? :-)

I have serious doubts about including such a patch without testing on
a large range of AMD64 systems
with a large range of distro/X servers.

Dave.

>
> static const struct agp_bridge_driver amd_8151_driver = {
> ...
>
> The "cant_use_aperture" paths have possibly
> never been tested on amd64 agp systems, but
> are in use on these systems:
>
> alpha-agp.c:    .cant_use_aperture      = true,
> hp-agp.c:       .cant_use_aperture      = true,
> i460-agp.c:     .cant_use_aperture      = true,
> parisc-agp.c:   .cant_use_aperture      = true,
> sgi-agp.c:      .cant_use_aperture = true,
> uninorth-agp.c: .cant_use_aperture      = true,
> uninorth-agp.c: .cant_use_aperture      = true,
>
> Thanks for any more enlightenment,
> Bob Montgomery
>
>
>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] disable CPU side GART accesses
  2008-11-03 23:55                 ` Dave Airlie
@ 2008-11-19 22:12                   ` Bob Montgomery
  0 siblings, 0 replies; 17+ messages in thread
From: Bob Montgomery @ 2008-11-19 22:12 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Dave Jones, Yinghai Lu, Ingo Molnar, linux-kernel, vojtech,
	Linus Torvalds, chandru, Joerg Roedel, FUJITA Tomonori,
	Jesse Barnes, Pavel Machek

On Mon, 2008-11-03 at 23:55 +0000, Dave Airlie wrote:
> On Tue, Nov 4, 2008 at 9:36 AM, Bob Montgomery <bob.montgomery@hp.com> wrote:

> > We're specifically talking about AMD64, and we're not using an IOMMU for
> > AGP, we're using the AMD64 implementation of the GART for an IOMMU.
> > The (possible) danger is that some old AMD64 system could also (or
> > instead) try using the GART for AGP and run into a problem since my
> > patch wants to disable CPU side access to the aperture, which is fine
> > when we're using it as an IOMMU.
> >
> > In drivers/gpu/drm/drm_memory.c:agp_remap(), there are these comments
> > about the part of the code that deals with "cant_use_aperture":
> >
> > /*
> >  * OK, we're mapping AGP space on a chipset/platform on which
> >  * memory accesses by the CPU do not get remapped by the GART.
> >  * We fix this by using the kernel's page-table instead (that's
> >  * probably faster anyhow...).
> >  */
> >
> > So that's encouraging.  Now the question is this:  Can I just go into
> > amd64-agp.c and add ".cant_use_aperture=true" to the agp_bridge_driver
> > struct?  Who's brave enough to say that will just work? :-)
> 
> I have serious doubts about including such a patch without testing on
> a large range of AMD64 systems
> with a large range of distro/X servers.
> 
> Dave.

Well, since I hate it when kernel discussion threads just end with no
resolution...

I don't have access to a large range of AMD64 systems that use AGP
graphics.  In fact, I can't find any around here.  So testing my way to
resolving this potential problem in these drivers is probably not going
to work.

I've seen references to systems that had Opterons, and AGP graphics, and
could hold more than 4GB of RAM, but I don't know how many are out
there.  So since I can't do a bunch of distro/X server testing, I'm not
sure how to proceed to get as simple a fix for this problem on real
systems that do exist now.

Bob Montgomery



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-11-19 22:09 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-15 21:48 [PATCH] disable CPU side GART accesses Bob Montgomery
2008-10-15 23:40 ` Linus Torvalds
2008-10-16 19:17   ` Bob Montgomery
2008-10-15 23:48 ` Ingo Molnar
2008-10-16  0:22   ` Yinghai Lu
2008-10-16 17:00     ` Bob Montgomery
2008-10-16 17:43       ` Yinghai Lu
2008-10-16 19:26         ` Bob Montgomery
2008-10-27 22:42   ` Bob Montgomery
2008-10-27 23:06     ` Yinghai Lu
2008-10-29 20:52       ` Bob Montgomery
2008-10-29 21:24         ` Dave Airlie
2008-10-29 21:32           ` Dave Jones
2008-10-29 21:40             ` Dave Airlie
2008-11-03 23:36               ` Bob Montgomery
2008-11-03 23:55                 ` Dave Airlie
2008-11-19 22:12                   ` Bob Montgomery

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).