LKML Archive on
help / color / mirror / Atom feed
From: Jan-Bernd Themann <>
Cc: Dave Hansen <>,
	Thomas Klein <>,
	"Themann, Jan-Bernd" <>,
	netdev <>, apw <>,
	linux-kernel <>,
	Thomas Klein <>,
	Christoph Raisch <>,
	Badari Pulavarty <>, Greg KH <>
Subject: Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier
Date: Wed, 13 Feb 2008 16:17:57 +0100	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Hi Dave,

On Monday 11 February 2008 17:47, Dave Hansen wrote:
> Also, just ripping down and completely re-doing the entire mass of cards
> every time a 16MB area of memory is added or removed seems like an
> awfully big sledgehammer to me.  I would *HATE* to see anybody else
> using this driver as an example to work off of?  Can't you just keep
> track of which areas the driver is actually *USING* and only worry about
> changing mappings if that intersects with an area having hotplug done on
> it?

to form a base for the eHEA memory add / remove concept discussion:

Explanation of the current eHEA memory add / remove concept:

Constraints imposed by HW / FW:
- eHEA has own MMU
- eHEA  Memory Regions (MRs) are used by the eHEA MMU  to translate virtual
  addresses to absolute addresses (like DMA mapped memory on a PCI bus)
- The number of MRs is limited (not enough to have one MR per packet)
- Registration of MRs is comparativley slow as done via slow firmware call
- MRs can have a maximum size of the memory available under linux
- MRs cover a contiguous virtual memory block (no holes)

Because of this there is just one big MR that covers entire kernel memory.
We also need a mapping table from kernel addresses to this
contiguous "virtual memory IO space" (here called ehea_bmap).

- When memory is added / removed to LPAR (and linux), the MR has to be updated.
  This can only be done by destroying and recreating the MR. There is no H_CALL
  to modify MR size. To find holes in the linux kernel memory layout we have to
  iterate over the memory sections for recreating a ehea_bmap
  (otherwise MR would be bigger then available memory causing the
  registration to fail)

- DLPAR userspace tools, kernel, driver, firmware and HMC are involved in that
  process on System p

Memory add: version without a external memory notifier call
- new memory used in a transfer_xmit will result in a "ehea_bmap
  translation miss", which triggers a rebuild and reregistration
  of the ehea_bmap based on the current kernel memory setup.
- advantage: the number of MR rebuilds is reduced significantly compared to
  a rebuild for each 16MB chunk of memory added.

Memory add: version with external notifier call:
- We still need a ehea_bmap (whatever structure it has)

Memory remove with notifier:
- We have to rebuild the ehea_bmap instantly to remove the pages that are
  no longer available. Without doing that, the firmware (pHYP) cannot remove
  that memory from the LPAR. As we don't know if or how many additional 
  sections are to be removed before the DLPAR user space tool tells the 
  firmware to remove the memory, we can't wait with the rebuild.

Our current understanding about the current Memory Hotplug System are
(please correct me
if I'm wrong):

- depends on sparse mem
- only whole memory sections are added / removed
- for each section a memory resource is registered

>From the driver side we need:
- some kind of memory notification mechanism.
  For memory add we can live without any external memory notification
  event. For memory remove we do need an external trigger (see explanation
- a way to iterate over all kernel pages and a way to detect holes in the
  kernel memory layout in order to build up our own ehea_bmap.

Memory notification trigger:
- These triggers exist, an exported "register_memory_notifier" /
  "unregister_memory_notifier" would work in this scheme

Functions to use while building ehea_bmap + MRs:
- Use either the functions that are used by the memory hotplug system as
  well, that means using the section defines + functions (section_nr_to_pfn,
- Use currently other not exported functions in kernel/resource.c, like
  walk_memory_resource (where we would still need the maximum possible number
  of pages NR_MEM_SECTIONS)
- Maybe some kind of new interface?

What would you suggest?

Jan-Bernd & Christoph

  reply	other threads:[~2008-02-13 15:18 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-11 16:24 Jan-Bernd Themann
2008-02-11 16:47 ` Dave Hansen
2008-02-13 15:17   ` Jan-Bernd Themann [this message]
2008-02-13 17:05     ` Dave Hansen
2008-02-14  8:46     ` Christoph Raisch
2008-02-14 17:12       ` Dave Hansen
2008-02-14 17:36         ` Badari Pulavarty
2008-02-14 17:38           ` Dave Hansen
2008-02-15 13:22         ` Christoph Raisch
2008-02-15 16:55           ` Dave Hansen
2008-02-18 10:00             ` Jan-Bernd Themann
2008-02-20 18:14               ` Dave Hansen
2008-02-11 16:50 ` Dave Hansen
2008-02-12 18:04 ` Dave Hansen
  -- strict thread matches above, loose matches on Subject: below --
2008-02-11 15:57 Jan-Bernd Themann
2008-02-11 16:02 ` Sam Ravnborg
2008-02-11 16:04 ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \
    --subject='Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).