LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] ARM: mm: Regarding section when dealing with meminfo
@ 2011-01-20  9:45 KyongHo Cho
  2011-01-20 14:28 ` Minchan Kim
  2011-01-20 17:20 ` Dave Hansen
  0 siblings, 2 replies; 19+ messages in thread
From: KyongHo Cho @ 2011-01-20  9:45 UTC (permalink / raw)
  To: linux-arm-kernel, linux-mm
  Cc: linux-kernel, linux-samsung-soc, Kukjin Kim, Ilho Lee,
	KeyYoung Park, KyongHo Cho

Sparsemem allows that a bank of memory spans over several adjacent
sections if the start address and the end address of the bank
belong to different sections.
When gathering statictics of physical memory in mem_init() and
show_mem(), this possiblity was not considered.

This patch guarantees that simple increasing the pointer to page
descriptors does not exceed the boundary of a section.

Signed-off-by: KyongHo Cho <pullip.cho@samsung.com>
---
 arch/arm/mm/init.c |   74 +++++++++++++++++++++++++++++++++++----------------
 1 files changed, 51 insertions(+), 23 deletions(-)

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 57c4c5c..6ccecbe 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -93,24 +93,38 @@ void show_mem(void)
 
 		pfn1 = bank_pfn_start(bank);
 		pfn2 = bank_pfn_end(bank);
-
+#ifndef CONFIG_SPARSEMEM
 		page = pfn_to_page(pfn1);
 		end  = pfn_to_page(pfn2 - 1) + 1;
-
+#else
+		pfn2--;
 		do {
-			total++;
-			if (PageReserved(page))
-				reserved++;
-			else if (PageSwapCache(page))
-				cached++;
-			else if (PageSlab(page))
-				slab++;
-			else if (!page_count(page))
-				free++;
-			else
-				shared += page_count(page) - 1;
-			page++;
-		} while (page < end);
+			page = pfn_to_page(pfn1);
+			if (pfn_to_section_nr(pfn1) < pfn_to_section_nr(pfn2)) {
+				pfn1 += PAGES_PER_SECTION;
+				pfn1 &= PAGE_SECTION_MASK;
+			} else {
+				pfn1 = pfn2;
+			}
+			end = pfn_to_page(pfn1) + 1;
+#endif
+			do {
+				total++;
+				if (PageReserved(page))
+					reserved++;
+				else if (PageSwapCache(page))
+					cached++;
+				else if (PageSlab(page))
+					slab++;
+				else if (!page_count(page))
+					free++;
+				else
+					shared += page_count(page) - 1;
+				page++;
+			} while (page < end);
+#ifdef CONFIG_SPARSEMEM
+		} while (pfn1 < pfn2);
+#endif
 	}
 
 	printk("%d pages of RAM\n", total);
@@ -470,17 +484,31 @@ void __init mem_init(void)
 
 		pfn1 = bank_pfn_start(bank);
 		pfn2 = bank_pfn_end(bank);
-
+#ifndef CONFIG_SPARSEMEM
 		page = pfn_to_page(pfn1);
 		end  = pfn_to_page(pfn2 - 1) + 1;
-
+#else
+		pfn2--;
 		do {
-			if (PageReserved(page))
-				reserved_pages++;
-			else if (!page_count(page))
-				free_pages++;
-			page++;
-		} while (page < end);
+			page = pfn_to_page(pfn1);
+			if (pfn_to_section_nr(pfn1) < pfn_to_section_nr(pfn2)) {
+				pfn1 += PAGES_PER_SECTION;
+				pfn1 &= PAGE_SECTION_MASK;
+			} else {
+				pfn1 = pfn2;
+			}
+			end = pfn_to_page(pfn1) + 1;
+#endif
+			do {
+				if (PageReserved(page))
+					reserved_pages++;
+				else if (!page_count(page))
+					free_pages++;
+				page++;
+			} while (page < end);
+#ifdef CONFIG_SPARSEMEM
+		} while (pfn1 < pfn2);
+#endif
 	}
 
 	/*
-- 
1.6.2.5


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
  2011-01-20  9:45 [PATCH] ARM: mm: Regarding section when dealing with meminfo KyongHo Cho
@ 2011-01-20 14:28 ` Minchan Kim
  2011-01-20 17:25   ` Dave Hansen
       [not found]   ` <AANLkTinXAiShaf1f69ufVHg7KPaY5j=jmOTtK71GNNp5@mail.gmail.com>
  2011-01-20 17:20 ` Dave Hansen
  1 sibling, 2 replies; 19+ messages in thread
From: Minchan Kim @ 2011-01-20 14:28 UTC (permalink / raw)
  To: KyongHo Cho
  Cc: linux-arm-kernel, linux-mm, linux-kernel, linux-samsung-soc,
	Kukjin Kim, Ilho Lee, KeyYoung Park

On Thu, Jan 20, 2011 at 06:45:39PM +0900, KyongHo Cho wrote:
> Sparsemem allows that a bank of memory spans over several adjacent
> sections if the start address and the end address of the bank
> belong to different sections.
> When gathering statictics of physical memory in mem_init() and
> show_mem(), this possiblity was not considered.

Please write down the result if we doesn't consider this patch.
I can understand what happens but for making good description and review,
merging easily, it would be better to write down the result without 
the patch explicitly.

> 
> This patch guarantees that simple increasing the pointer to page
> descriptors does not exceed the boundary of a section.
> 
> Signed-off-by: KyongHo Cho <pullip.cho@samsung.com>
> ---
>  arch/arm/mm/init.c |   74 +++++++++++++++++++++++++++++++++++----------------
>  1 files changed, 51 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 57c4c5c..6ccecbe 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -93,24 +93,38 @@ void show_mem(void)
>  
>  		pfn1 = bank_pfn_start(bank);
>  		pfn2 = bank_pfn_end(bank);
> -
> +#ifndef CONFIG_SPARSEMEM
>  		page = pfn_to_page(pfn1);
>  		end  = pfn_to_page(pfn2 - 1) + 1;
> -
> +#else
> +		pfn2--;
>  		do {
> -			total++;
> -			if (PageReserved(page))
> -				reserved++;
> -			else if (PageSwapCache(page))
> -				cached++;
> -			else if (PageSlab(page))
> -				slab++;
> -			else if (!page_count(page))
> -				free++;
> -			else
> -				shared += page_count(page) - 1;
> -			page++;
> -		} while (page < end);
> +			page = pfn_to_page(pfn1);
> +			if (pfn_to_section_nr(pfn1) < pfn_to_section_nr(pfn2)) {
> +				pfn1 += PAGES_PER_SECTION;
> +				pfn1 &= PAGE_SECTION_MASK;
> +			} else {
> +				pfn1 = pfn2;
> +			}
> +			end = pfn_to_page(pfn1) + 1;
> +#endif
> +			do {
> +				total++;
> +				if (PageReserved(page))
> +					reserved++;
> +				else if (PageSwapCache(page))
> +					cached++;
> +				else if (PageSlab(page))
> +					slab++;
> +				else if (!page_count(page))
> +					free++;
> +				else
> +					shared += page_count(page) - 1;
> +				page++;
> +			} while (page < end);
> +#ifdef CONFIG_SPARSEMEM
> +		} while (pfn1 < pfn2);
> +#endif
>  	}
>  
>  	printk("%d pages of RAM\n", total);
> @@ -470,17 +484,31 @@ void __init mem_init(void)
>  
>  		pfn1 = bank_pfn_start(bank);
>  		pfn2 = bank_pfn_end(bank);
> -
> +#ifndef CONFIG_SPARSEMEM
>  		page = pfn_to_page(pfn1);
>  		end  = pfn_to_page(pfn2 - 1) + 1;
> -
> +#else
> +		pfn2--;
>  		do {
> -			if (PageReserved(page))
> -				reserved_pages++;
> -			else if (!page_count(page))
> -				free_pages++;
> -			page++;
> -		} while (page < end);
> +			page = pfn_to_page(pfn1);
> +			if (pfn_to_section_nr(pfn1) < pfn_to_section_nr(pfn2)) {
> +				pfn1 += PAGES_PER_SECTION;
> +				pfn1 &= PAGE_SECTION_MASK;
> +			} else {
> +				pfn1 = pfn2;
> +			}
> +			end = pfn_to_page(pfn1) + 1;
> +#endif
> +			do {
> +				if (PageReserved(page))
> +					reserved_pages++;
> +				else if (!page_count(page))
> +					free_pages++;
> +				page++;
> +			} while (page < end);
> +#ifdef CONFIG_SPARSEMEM
> +		} while (pfn1 < pfn2);
> +#endif
>  	}

Hmm.. new ifndef magic makes code readability bad.
Couldn't we do it by simple pfn iterator not page and pfn_valid check?

>  
>  	/*
> -- 
> 1.6.2.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
  2011-01-20  9:45 [PATCH] ARM: mm: Regarding section when dealing with meminfo KyongHo Cho
  2011-01-20 14:28 ` Minchan Kim
@ 2011-01-20 17:20 ` Dave Hansen
       [not found]   ` <AANLkTi=nsAOtLPK75Wy5Rm8pfWob8xTP5259DyYuxR9J@mail.gmail.com>
  2011-01-20 18:01   ` Russell King - ARM Linux
  1 sibling, 2 replies; 19+ messages in thread
From: Dave Hansen @ 2011-01-20 17:20 UTC (permalink / raw)
  To: KyongHo Cho
  Cc: linux-arm-kernel, linux-mm, linux-kernel, linux-samsung-soc,
	Kukjin Kim, Ilho Lee, KeyYoung Park

On Thu, 2011-01-20 at 18:45 +0900, KyongHo Cho wrote:
> Sparsemem allows that a bank of memory spans over several adjacent
> sections if the start address and the end address of the bank
> belong to different sections.
> When gathering statictics of physical memory in mem_init() and
> show_mem(), this possiblity was not considered.
> 
> This patch guarantees that simple increasing the pointer to page
> descriptors does not exceed the boundary of a section
...
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 57c4c5c..6ccecbe 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -93,24 +93,38 @@ void show_mem(void)
> 
>  		pfn1 = bank_pfn_start(bank);
>  		pfn2 = bank_pfn_end(bank);
> -
> +#ifndef CONFIG_SPARSEMEM
>  		page = pfn_to_page(pfn1);
>  		end  = pfn_to_page(pfn2 - 1) + 1;
> -
> +#else
> +		pfn2--;
>  		do {
> -			total++;
> -			if (PageReserved(page))
> -				reserved++;
> -			else if (PageSwapCache(page))
> -				cached++;
> -			else if (PageSlab(page))
> -				slab++;
> -			else if (!page_count(page))
> -				free++;
> -			else
> -				shared += page_count(page) - 1;
> -			page++;
> -		} while (page < end);
> +			page = pfn_to_page(pfn1);
> +			if (pfn_to_section_nr(pfn1) < pfn_to_section_nr(pfn2)) {
> +				pfn1 += PAGES_PER_SECTION;
> +				pfn1 &= PAGE_SECTION_MASK;
> +			} else {
> +				pfn1 = pfn2;
> +			}
> +			end = pfn_to_page(pfn1) + 1;
> +#endif

This problem actually exists without sparsemem, too.  Discontigmem (at
least) does it as well.

The x86 version of show_mem() actually manages to do this without any
#ifdefs, and works for a ton of configuration options.  It uses
pfn_valid() to tell whether it can touch a given pfn.

Long-term, it might be a good idea to convert arm's show_mem() over to
use pgdat's like everything else.  But, for now, you should just be able
to do something roughly like this:

-               page = pfn_to_page(pfn1);
-               end  = pfn_to_page(pfn2 - 1) + 1;
-
-               do {
+		for (pfn = pfn1; pfn < pfn2; pfn++) {
+			if (!pfn_valid(pfn))
+				continue;
+			page = pfn_to_page(pfn);
+
                        total++;
                        if (PageReserved(page))
                                reserved++;
                        else if (PageSwapCache(page))
                                cached++;
                        else if (PageSlab(page))
                                slab++;
                        else if (!page_count(page))
                                free++;
                        else
                                shared += page_count(page) - 1;
                        page++;
-               } while (page < end);
+		}

That should work for sparsemem, or any other crazy memory models that we
come up with.  pfn_to_page() is pretty quick, especially when doing it
in a tight loop like that.

-- Dave


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
  2011-01-20 14:28 ` Minchan Kim
@ 2011-01-20 17:25   ` Dave Hansen
       [not found]   ` <AANLkTinXAiShaf1f69ufVHg7KPaY5j=jmOTtK71GNNp5@mail.gmail.com>
  1 sibling, 0 replies; 19+ messages in thread
From: Dave Hansen @ 2011-01-20 17:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KyongHo Cho, linux-arm-kernel, linux-mm, linux-kernel,
	linux-samsung-soc, Kukjin Kim, Ilho Lee, KeyYoung Park

On Thu, 2011-01-20 at 23:28 +0900, Minchan Kim wrote:
> On Thu, Jan 20, 2011 at 06:45:39PM +0900, KyongHo Cho wrote:
> > Sparsemem allows that a bank of memory spans over several adjacent
> > sections if the start address and the end address of the bank
> > belong to different sections.
> > When gathering statictics of physical memory in mem_init() and
> > show_mem(), this possiblity was not considered.
> 
> Please write down the result if we doesn't consider this patch.
> I can understand what happens but for making good description and review,
> merging easily, it would be better to write down the result without 
> the patch explicitly. 

You'll oops.  __section_mem_map_addr() in:

> #define __pfn_to_page(pfn)                              \
> ({      unsigned long __pfn = (pfn);                    \
>         struct mem_section *__sec = __pfn_to_section(__pfn);    \
>         __section_mem_map_addr(__sec) + __pfn;          \
> })

will return NULL, you'll add some fuzz on to it with __pfn, then you'll
oops when the arm show_mem() does PageReserved() and dereferences
page->flags.

Ether that, or with the sparsemem vmemmap variant, you'll get a
valid-looking pointer with no backing memory, and oops as well.

-- Dave


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
       [not found]   ` <AANLkTinXAiShaf1f69ufVHg7KPaY5j=jmOTtK71GNNp5@mail.gmail.com>
@ 2011-01-20 17:43     ` Minchan Kim
  2011-01-20 17:44       ` Minchan Kim
  0 siblings, 1 reply; 19+ messages in thread
From: Minchan Kim @ 2011-01-20 17:43 UTC (permalink / raw)
  To: KyongHo Cho, inux-arm-kernel, linux-mm, linux-kernel,
	linux-samsung-soc, Kukjin Kim, Ilho Lee, KeyYoung Park,
	KyongHo Cho

Restore Cced.

On Fri, Jan 21, 2011 at 2:24 AM, KyongHo Cho <pullip.linux@gmail.com> wrote:
> On Thu, Jan 20, 2011 at 11:28 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
>> On Thu, Jan 20, 2011 at 06:45:39PM +0900, KyongHo Cho wrote:
>>> Sparsemem allows that a bank of memory spans over several adjacent
>>> sections if the start address and the end address of the bank
>>> belong to different sections.
>>> When gathering statictics of physical memory in mem_init() and
>>> show_mem(), this possiblity was not considered.
>>
>> Please write down the result if we doesn't consider this patch.
>> I can understand what happens but for making good description and review,
>> merging easily, it would be better to write down the result without
>> the patch explicitly.
>>
> As we know that each section has its own memmap and
> a contiguous chunk of physical memory that is represented by 'bank' in meminfo
> can be larger than the size of a section.
> "page++" in the current implementation can access invalid memory area.
> The size of the section is 256 MiB in ARM and the number of banks in
> meminfo is 8.
> This means that the maximum size of the physical memory cannot be grow than 2GiB
> to avoid this problem in the current implementation.
> Thus we need to fix the calculation of the last page descriptor in
> terms of sections.
>
> This patch determines the last page descriptor in a memmap with
> min(last_pfn_of_bank, last_pfn_of_current_section)
> If there remains physical memory not consumed, it calculates the last
> page descriptor
> with min(last_pfn_of_bank, last_pfn_of_next_section).
>
>>
>> Hmm.. new ifndef magic makes code readability bad.
>> Couldn't we do it by simple pfn iterator not page and pfn_valid check?
>>
> True.
> We need to consider the implementation again.
> I think the previous implementation gave the importance to the
> efficiency but to the readability.
>

Please consider readability and consistency with other architectures
if we can do. :)
Thanks.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
  2011-01-20 17:43     ` Minchan Kim
@ 2011-01-20 17:44       ` Minchan Kim
  2011-01-20 17:52         ` KyongHo Cho
  0 siblings, 1 reply; 19+ messages in thread
From: Minchan Kim @ 2011-01-20 17:44 UTC (permalink / raw)
  To: KyongHo Cho, linux-arm-kernel, linux-mm, linux-kernel,
	linux-samsung-soc, Kukjin Kim, Ilho Lee, KeyYoung Park,
	KyongHo Cho

Fix linux-arm-kernel address.

On Fri, Jan 21, 2011 at 2:43 AM, Minchan Kim <minchan.kim@gmail.com> wrote:
> Restore Cced.
>
> On Fri, Jan 21, 2011 at 2:24 AM, KyongHo Cho <pullip.linux@gmail.com> wrote:
>> On Thu, Jan 20, 2011 at 11:28 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
>>> On Thu, Jan 20, 2011 at 06:45:39PM +0900, KyongHo Cho wrote:
>>>> Sparsemem allows that a bank of memory spans over several adjacent
>>>> sections if the start address and the end address of the bank
>>>> belong to different sections.
>>>> When gathering statictics of physical memory in mem_init() and
>>>> show_mem(), this possiblity was not considered.
>>>
>>> Please write down the result if we doesn't consider this patch.
>>> I can understand what happens but for making good description and review,
>>> merging easily, it would be better to write down the result without
>>> the patch explicitly.
>>>
>> As we know that each section has its own memmap and
>> a contiguous chunk of physical memory that is represented by 'bank' in meminfo
>> can be larger than the size of a section.
>> "page++" in the current implementation can access invalid memory area.
>> The size of the section is 256 MiB in ARM and the number of banks in
>> meminfo is 8.
>> This means that the maximum size of the physical memory cannot be grow than 2GiB
>> to avoid this problem in the current implementation.
>> Thus we need to fix the calculation of the last page descriptor in
>> terms of sections.
>>
>> This patch determines the last page descriptor in a memmap with
>> min(last_pfn_of_bank, last_pfn_of_current_section)
>> If there remains physical memory not consumed, it calculates the last
>> page descriptor
>> with min(last_pfn_of_bank, last_pfn_of_next_section).
>>
>>>
>>> Hmm.. new ifndef magic makes code readability bad.
>>> Couldn't we do it by simple pfn iterator not page and pfn_valid check?
>>>
>> True.
>> We need to consider the implementation again.
>> I think the previous implementation gave the importance to the
>> efficiency but to the readability.
>>
>
> Please consider readability and consistency with other architectures
> if we can do. :)
> Thanks.
>
> --
> Kind regards,
> Minchan Kim
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
       [not found]   ` <AANLkTi=nsAOtLPK75Wy5Rm8pfWob8xTP5259DyYuxR9J@mail.gmail.com>
@ 2011-01-20 17:48     ` KyongHo Cho
  2011-01-20 18:04     ` Dave Hansen
  1 sibling, 0 replies; 19+ messages in thread
From: KyongHo Cho @ 2011-01-20 17:48 UTC (permalink / raw)
  To: Linux ARM Kernel
  Cc: KyongHo Cho, linux-mm, linux-kernel, linux-samsung-soc,
	Kukjin Kim, Ilho Lee, KeyYoung Park, Minchan Kim

On Fri, Jan 21, 2011 at 2:20 AM, Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> On Thu, 2011-01-20 at 18:45 +0900, KyongHo Cho wrote:
>> Sparsemem allows that a bank of memory spans over several adjacent
>> sections if the start address and the end address of the bank
>> belong to different sections.
>> When gathering statictics of physical memory in mem_init() and
>> show_mem(), this possiblity was not considered.
>>
>> This patch guarantees that simple increasing the pointer to page
>> descriptors does not exceed the boundary of a section
> ...
>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>> index 57c4c5c..6ccecbe 100644
>> --- a/arch/arm/mm/init.c
>> +++ b/arch/arm/mm/init.c
>> @@ -93,24 +93,38 @@ void show_mem(void)
>>
>>               pfn1 = bank_pfn_start(bank);
>>               pfn2 = bank_pfn_end(bank);
>> -
>> +#ifndef CONFIG_SPARSEMEM
>>               page = pfn_to_page(pfn1);
>>               end  = pfn_to_page(pfn2 - 1) + 1;
>> -
>> +#else
>> +             pfn2--;
>>               do {
>> -                     total++;
>> -                     if (PageReserved(page))
>> -                             reserved++;
>> -                     else if (PageSwapCache(page))
>> -                             cached++;
>> -                     else if (PageSlab(page))
>> -                             slab++;
>> -                     else if (!page_count(page))
>> -                             free++;
>> -                     else
>> -                             shared += page_count(page) - 1;
>> -                     page++;
>> -             } while (page < end);
>> +                     page = pfn_to_page(pfn1);
>> +                     if (pfn_to_section_nr(pfn1) < pfn_to_section_nr(pfn2)) {
>> +                             pfn1 += PAGES_PER_SECTION;
>> +                             pfn1 &= PAGE_SECTION_MASK;
>> +                     } else {
>> +                             pfn1 = pfn2;
>> +                     }
>> +                     end = pfn_to_page(pfn1) + 1;
>> +#endif
>
> This problem actually exists without sparsemem, too.  Discontigmem (at
> least) does it as well.
>

Actually, as long as a bank in meminfo only resides in a pgdat, no
problem happens
because there is no restriction of size of area in a pgdat.
That's why I just considered about sparsemem.

> The x86 version of show_mem() actually manages to do this without any
> #ifdefs, and works for a ton of configuration options.  It uses
> pfn_valid() to tell whether it can touch a given pfn.
>
> Long-term, it might be a good idea to convert arm's show_mem() over to
> use pgdat's like everything else.  But, for now, you should just be able
> to do something roughly like this:
>
> -               page = pfn_to_page(pfn1);
> -               end  = pfn_to_page(pfn2 - 1) + 1;
> -
> -               do {
> +               for (pfn = pfn1; pfn < pfn2; pfn++) {
> +                       if (!pfn_valid(pfn))
> +                               continue;
> +                       page = pfn_to_page(pfn);
> +
>                        total++;
>                        if (PageReserved(page))
>                                reserved++;
>                        else if (PageSwapCache(page))
>                                cached++;
>                        else if (PageSlab(page))
>                                slab++;
>                        else if (!page_count(page))
>                                free++;
>                        else
>                                shared += page_count(page) - 1;
>                        page++;
> -               } while (page < end);
> +               }
>
> That should work for sparsemem, or any other crazy memory models that we
> come up with.  pfn_to_page() is pretty quick, especially when doing it
> in a tight loop like that.
>

That's true.
I worried that pfn_to_page() in sparsemem is a bit slower than that in flatmem.
Moreover, the previous one didn't use pfn_to_page() but page++ for the
performance.
Nevertheless, I also think that pfn_to_page() make the code neat.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
  2011-01-20 17:44       ` Minchan Kim
@ 2011-01-20 17:52         ` KyongHo Cho
  0 siblings, 0 replies; 19+ messages in thread
From: KyongHo Cho @ 2011-01-20 17:52 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-arm-kernel, linux-mm, linux-kernel, linux-samsung-soc,
	Kukjin Kim, Ilho Lee, KeyYoung Park, KyongHo Cho

On Fri, Jan 21, 2011 at 2:44 AM, Minchan Kim <minchan.kim@gmail.com> wrote:
> Fix linux-arm-kernel address.
>
Thank you!
Too late in the night:)

> On Fri, Jan 21, 2011 at 2:43 AM, Minchan Kim <minchan.kim@gmail.com> wrote:
>> Restore Cced.
>>
>> On Fri, Jan 21, 2011 at 2:24 AM, KyongHo Cho <pullip.linux@gmail.com> wrote:
>>> On Thu, Jan 20, 2011 at 11:28 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
>>>> On Thu, Jan 20, 2011 at 06:45:39PM +0900, KyongHo Cho wrote:
>>>>> Sparsemem allows that a bank of memory spans over several adjacent
>>>>> sections if the start address and the end address of the bank
>>>>> belong to different sections.
>>>>> When gathering statictics of physical memory in mem_init() and
>>>>> show_mem(), this possiblity was not considered.
>>>>
>>>> Please write down the result if we doesn't consider this patch.
>>>> I can understand what happens but for making good description and review,
>>>> merging easily, it would be better to write down the result without
>>>> the patch explicitly.
>>>>
>>> As we know that each section has its own memmap and
>>> a contiguous chunk of physical memory that is represented by 'bank' in meminfo
>>> can be larger than the size of a section.
>>> "page++" in the current implementation can access invalid memory area.
>>> The size of the section is 256 MiB in ARM and the number of banks in
>>> meminfo is 8.
>>> This means that the maximum size of the physical memory cannot be grow than 2GiB
>>> to avoid this problem in the current implementation.
>>> Thus we need to fix the calculation of the last page descriptor in
>>> terms of sections.
>>>
>>> This patch determines the last page descriptor in a memmap with
>>> min(last_pfn_of_bank, last_pfn_of_current_section)
>>> If there remains physical memory not consumed, it calculates the last
>>> page descriptor
>>> with min(last_pfn_of_bank, last_pfn_of_next_section).
>>>
>>>>
>>>> Hmm.. new ifndef magic makes code readability bad.
>>>> Couldn't we do it by simple pfn iterator not page and pfn_valid check?
>>>>
>>> True.
>>> We need to consider the implementation again.
>>> I think the previous implementation gave the importance to the
>>> efficiency but to the readability.
>>>
>>
>> Please consider readability and consistency with other architectures
>> if we can do. :)
>> Thanks.
>>
>> --
>> Kind regards,
>> Minchan Kim
>>
>
>
>
> --
> Kind regards,
> Minchan Kim
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
  2011-01-20 17:20 ` Dave Hansen
       [not found]   ` <AANLkTi=nsAOtLPK75Wy5Rm8pfWob8xTP5259DyYuxR9J@mail.gmail.com>
@ 2011-01-20 18:01   ` Russell King - ARM Linux
  2011-01-20 18:11     ` Dave Hansen
  2011-01-21  2:12     ` KyongHo Cho
  1 sibling, 2 replies; 19+ messages in thread
From: Russell King - ARM Linux @ 2011-01-20 18:01 UTC (permalink / raw)
  To: Dave Hansen
  Cc: KyongHo Cho, Kukjin Kim, KeyYoung Park, linux-kernel, Ilho Lee,
	linux-mm, linux-samsung-soc, linux-arm-kernel

On Thu, Jan 20, 2011 at 09:20:47AM -0800, Dave Hansen wrote:
> This problem actually exists without sparsemem, too.  Discontigmem (at
> least) does it as well.

We don't expect banks to cross sparsemem boundaries, or the older
discontigmem nodes (esp. as we used to store the node number.)
Discontigmem support has been removed now so that doesn't apply
anymore.

> The x86 version of show_mem() actually manages to do this without any
> #ifdefs, and works for a ton of configuration options.  It uses
> pfn_valid() to tell whether it can touch a given pfn.

x86 memory layout tends to be very simple as it expects memory to
start at the beginning of every region described by a pgdat and extend
in one contiguous block.  I wish ARM was that simple.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
       [not found]   ` <AANLkTi=nsAOtLPK75Wy5Rm8pfWob8xTP5259DyYuxR9J@mail.gmail.com>
  2011-01-20 17:48     ` KyongHo Cho
@ 2011-01-20 18:04     ` Dave Hansen
  1 sibling, 0 replies; 19+ messages in thread
From: Dave Hansen @ 2011-01-20 18:04 UTC (permalink / raw)
  To: KyongHo Cho
  Cc: linux-arm-kernel, linux-mm, linux-kernel, linux-samsung-soc,
	Kukjin Kim, Ilho Lee, KeyYoung Park, KyongHo Cho, MinChan Kim,
	Andy Whitcroft

On Fri, 2011-01-21 at 02:38 +0900, KyongHo Cho wrote:
> Actually, as long as a bank in meminfo only resides in a pgdat, no
> problem happens
> because there is no restriction of size of area in a pgdat.
> That's why I just considered about sparsemem.

Ahh, so "banks" are always underneath a single pgdat, and a "bank" is
always contiguous?  That's handy.

> I worried that pfn_to_page() in sparsemem is a bit slower than that in flatmem.
> Moreover, the previous one didn't use pfn_to_page() but page++ for the
> performance.
> Nevertheless, I also think that pfn_to_page() make the code neat.

The sparsemem_vmemmap pfn_to_page() is just arithmetic.  The table-based
sparsemem requires lookups and is a _bit_ slower, but the tables have
very nice CPU cache properties and shouldn't miss the L1 very often in a
loop like that.

show_mem() isn't exactly a performance-critical path, either, right?
It's just an exception or error path.

If it turns out that doing pfn_to_page() *is* too slow, there are a
couple more alternatives.  pfn_to_section_nr() is just a bit shift and
is really cheap.  Should be just an instruction or two with either no
memory access, or just a load of the pfn from the stack.

We could make a generic function like this (Or I guess we could also
just make sure that pfn_to_section_nr() always returns 0 for
non-sparsemem configurations):

int pfns_same_section(unsigned long pfn1, unsigned long pfn2)
{
#ifdef CONFIG_SPARSEMEM
	return (pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2));
#else
	return 1;
#endif
}

and use it in show_mem like so:

                do {
                        total++;
                        if (PageReserved(page))
                                reserved++;
                        else if (PageSwapCache(page))
                                cached++;
                        else if (PageSlab(page))
                                slab++;
                        else if (!page_count(page))
                                free++;
                        else
                                shared += page_count(page) - 1;
			pfn1++;
			/*
			 * Did we just cross a section boundary?
			 * If so, our pointer arithmetic is not
			 * valid, and we must re-run pfn_to_page()
			 */
			if (pfns_same_section(pfn1-1, pfn1)) {
	                        page++;
			} else {
				page = pfn_to_page(pfn1);
			}
                } while (page < end);

We can do basically the same thing, but instead checking to see if we
crossed a MAX_ORDER boundary.  That would keep us from having to refer
to sparsemem at all.  The buddy allocator relies on that guarantee, so
it's pretty set in stone.

-- Dave


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
  2011-01-20 18:01   ` Russell King - ARM Linux
@ 2011-01-20 18:11     ` Dave Hansen
  2011-01-23 18:05       ` Russell King - ARM Linux
  2011-01-21  2:12     ` KyongHo Cho
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2011-01-20 18:11 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: KyongHo Cho, Kukjin Kim, KeyYoung Park, linux-kernel, Ilho Lee,
	linux-mm, linux-samsung-soc, linux-arm-kernel

On Thu, 2011-01-20 at 18:01 +0000, Russell King - ARM Linux wrote:
> > The x86 version of show_mem() actually manages to do this without any
> > #ifdefs, and works for a ton of configuration options.  It uses
> > pfn_valid() to tell whether it can touch a given pfn.
> 
> x86 memory layout tends to be very simple as it expects memory to
> start at the beginning of every region described by a pgdat and extend
> in one contiguous block.  I wish ARM was that simple.

x86 memory layouts can be pretty funky and have been that way for a long
time.  That's why we *have* to handle holes in x86's show_mem().  My
laptop even has a ~1GB hole in its ZONE_DMA32:

[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000010 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x0013c000

But:

Node 0, zone    DMA32
  pages free     82877
        min      12783
        low      15978
        high     19174
        scanned  0
        spanned  1044480
        present  765672

See how the present is ~1GB less than spanned?  That's because of an I/O
hole from ~3-4GB:

        # cat /proc/iomem  | grep RAM
        00010000-0009d7ff : System RAM
        00100000-bf6affff : System RAM
        100000000-13bffffff : System RAM

I guess if we were being really smart we'd just shrink the DMA32 zone
down and not even cover that area.  But, we don't.

Memory hotplug was the original reason that we put sparsemem in place,
but we also have plenty of configurations where there are holes in the
middle of zones and pgdats.  

-- Dave


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
  2011-01-20 18:01   ` Russell King - ARM Linux
  2011-01-20 18:11     ` Dave Hansen
@ 2011-01-21  2:12     ` KyongHo Cho
  2011-01-21 10:38       ` Russell King - ARM Linux
  1 sibling, 1 reply; 19+ messages in thread
From: KyongHo Cho @ 2011-01-21  2:12 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Dave Hansen, Kukjin Kim, KeyYoung Park, linux-kernel, Ilho Lee,
	linux-mm, linux-samsung-soc, linux-arm-kernel

On Fri, Jan 21, 2011 at 3:01 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, Jan 20, 2011 at 09:20:47AM -0800, Dave Hansen wrote:
>> This problem actually exists without sparsemem, too.  Discontigmem (at
>> least) does it as well.
>
> We don't expect banks to cross sparsemem boundaries, or the older
> discontigmem nodes (esp. as we used to store the node number.)
> Discontigmem support has been removed now so that doesn't apply
> anymore.
>

Our system can have 3 GiB of RAM in maximum.
In the near future, ARM APs can have up to 1 TiB with LPAE.

Since the size of section is 256 mib and NR_BANKS is defined as 8,
no ARM system can have more RAM than 2GiB in the current implementation.
If you want banks in meminfo not to cross sparsemem boundaries,
we need to find another way of physical memory specification in the kernel.

>> The x86 version of show_mem() actually manages to do this without any
>> #ifdefs, and works for a ton of configuration options.  It uses
>> pfn_valid() to tell whether it can touch a given pfn.
>
> x86 memory layout tends to be very simple as it expects memory to
> start at the beginning of every region described by a pgdat and extend
> in one contiguous block.  I wish ARM was that simple.
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
  2011-01-21  2:12     ` KyongHo Cho
@ 2011-01-21 10:38       ` Russell King - ARM Linux
  2011-01-21 11:15         ` KyongHo Cho
  0 siblings, 1 reply; 19+ messages in thread
From: Russell King - ARM Linux @ 2011-01-21 10:38 UTC (permalink / raw)
  To: KyongHo Cho
  Cc: Dave Hansen, Kukjin Kim, KeyYoung Park, linux-kernel, Ilho Lee,
	linux-mm, linux-samsung-soc, linux-arm-kernel

On Fri, Jan 21, 2011 at 11:12:27AM +0900, KyongHo Cho wrote:
> Since the size of section is 256 mib and NR_BANKS is defined as 8,
> no ARM system can have more RAM than 2GiB in the current implementation.
> If you want banks in meminfo not to cross sparsemem boundaries,
> we need to find another way of physical memory specification in the kernel.

There is no problem with increasing NR_BANKS.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
  2011-01-21 10:38       ` Russell King - ARM Linux
@ 2011-01-21 11:15         ` KyongHo Cho
  0 siblings, 0 replies; 19+ messages in thread
From: KyongHo Cho @ 2011-01-21 11:15 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Dave Hansen, Kukjin Kim, KeyYoung Park, linux-kernel, Ilho Lee,
	linux-mm, linux-samsung-soc, linux-arm-kernel

On Fri, Jan 21, 2011 at 7:38 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Fri, Jan 21, 2011 at 11:12:27AM +0900, KyongHo Cho wrote:
>> Since the size of section is 256 mib and NR_BANKS is defined as 8,
>> no ARM system can have more RAM than 2GiB in the current implementation.
>> If you want banks in meminfo not to cross sparsemem boundaries,
>> we need to find another way of physical memory specification in the kernel.
>
> There is no problem with increasing NR_BANKS.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
I think it is not reasonable to split a contiguous physical memory
into several chunks.
9 banks are required to use 2 gib.
Even though you think it is no problem,
it becomes a problem when we want to give physical memory information
via booting command line but atag
because there is a restriction in number of characters in booting command line.

I don't understand why larger bank size than the section size is problem.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
  2011-01-20 18:11     ` Dave Hansen
@ 2011-01-23 18:05       ` Russell King - ARM Linux
  2011-01-24 16:52         ` Dave Hansen
  0 siblings, 1 reply; 19+ messages in thread
From: Russell King - ARM Linux @ 2011-01-23 18:05 UTC (permalink / raw)
  To: Dave Hansen
  Cc: KyongHo Cho, Kukjin Kim, KeyYoung Park, linux-kernel, Ilho Lee,
	linux-mm, linux-samsung-soc, linux-arm-kernel

On Thu, Jan 20, 2011 at 10:11:27AM -0800, Dave Hansen wrote:
> On Thu, 2011-01-20 at 18:01 +0000, Russell King - ARM Linux wrote:
> > > The x86 version of show_mem() actually manages to do this without any
> > > #ifdefs, and works for a ton of configuration options.  It uses
> > > pfn_valid() to tell whether it can touch a given pfn.
> > 
> > x86 memory layout tends to be very simple as it expects memory to
> > start at the beginning of every region described by a pgdat and extend
> > in one contiguous block.  I wish ARM was that simple.
> 
> x86 memory layouts can be pretty funky and have been that way for a long
> time.  That's why we *have* to handle holes in x86's show_mem().  My
> laptop even has a ~1GB hole in its ZONE_DMA32:

If x86 is soo funky, I suggest you try the x86 version of show_mem()
on an ARM platform with memory holes.  Make sure you try it with
sparsemem as well...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
  2011-01-23 18:05       ` Russell King - ARM Linux
@ 2011-01-24 16:52         ` Dave Hansen
  2011-01-24 17:58           ` Russell King - ARM Linux
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2011-01-24 16:52 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: KyongHo Cho, Kukjin Kim, KeyYoung Park, linux-kernel, Ilho Lee,
	linux-mm, linux-samsung-soc, linux-arm-kernel

On Sun, 2011-01-23 at 18:05 +0000, Russell King - ARM Linux wrote:
> On Thu, Jan 20, 2011 at 10:11:27AM -0800, Dave Hansen wrote:
> > On Thu, 2011-01-20 at 18:01 +0000, Russell King - ARM Linux wrote:
> > > > The x86 version of show_mem() actually manages to do this without any
> > > > #ifdefs, and works for a ton of configuration options.  It uses
> > > > pfn_valid() to tell whether it can touch a given pfn.
> > > 
> > > x86 memory layout tends to be very simple as it expects memory to
> > > start at the beginning of every region described by a pgdat and extend
> > > in one contiguous block.  I wish ARM was that simple.
> > 
> > x86 memory layouts can be pretty funky and have been that way for a long
> > time.  That's why we *have* to handle holes in x86's show_mem().  My
> > laptop even has a ~1GB hole in its ZONE_DMA32:
> 
> If x86 is soo funky, I suggest you try the x86 version of show_mem()
> on an ARM platform with memory holes.  Make sure you try it with
> sparsemem as well...

x86 uses the generic lib/ show_mem().  It works for any holes, as long
as they're expressed in one of the memory models so that pfn_valid()
notices them.

ARM looks like its pfn_valid() is backed up by searching the (ASM
arch-specific) memblocks.  That looks like it would be fairly slow
compared to the other pfn_valid() implementations and I can see why it's
being avoided in show_mem().

Maybe we should add either the MAX_ORDER or section_nr() trick to the
lib/ implementation.  I bet that would use pfn_valid() rarely enough to
meet any performance concerns.

-- Dave


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
  2011-01-24 16:52         ` Dave Hansen
@ 2011-01-24 17:58           ` Russell King - ARM Linux
  2011-01-24 18:47             ` Dave Hansen
  2011-01-25  0:33             ` KyongHo Cho
  0 siblings, 2 replies; 19+ messages in thread
From: Russell King - ARM Linux @ 2011-01-24 17:58 UTC (permalink / raw)
  To: Dave Hansen
  Cc: KyongHo Cho, Kukjin Kim, KeyYoung Park, linux-kernel, Ilho Lee,
	linux-mm, linux-samsung-soc, linux-arm-kernel

On Mon, Jan 24, 2011 at 08:52:17AM -0800, Dave Hansen wrote:
> On Sun, 2011-01-23 at 18:05 +0000, Russell King - ARM Linux wrote:
> > On Thu, Jan 20, 2011 at 10:11:27AM -0800, Dave Hansen wrote:
> > > On Thu, 2011-01-20 at 18:01 +0000, Russell King - ARM Linux wrote:
> > > > > The x86 version of show_mem() actually manages to do this without any
> > > > > #ifdefs, and works for a ton of configuration options.  It uses
> > > > > pfn_valid() to tell whether it can touch a given pfn.
> > > > 
> > > > x86 memory layout tends to be very simple as it expects memory to
> > > > start at the beginning of every region described by a pgdat and extend
> > > > in one contiguous block.  I wish ARM was that simple.
> > > 
> > > x86 memory layouts can be pretty funky and have been that way for a long
> > > time.  That's why we *have* to handle holes in x86's show_mem().  My
> > > laptop even has a ~1GB hole in its ZONE_DMA32:
> > 
> > If x86 is soo funky, I suggest you try the x86 version of show_mem()
> > on an ARM platform with memory holes.  Make sure you try it with
> > sparsemem as well...
> 
> x86 uses the generic lib/ show_mem().  It works for any holes, as long
> as they're expressed in one of the memory models so that pfn_valid()
> notices them.

I think that's what I said.

> ARM looks like its pfn_valid() is backed up by searching the (ASM
> arch-specific) memblocks.  That looks like it would be fairly slow
> compared to the other pfn_valid() implementations and I can see why it's
> being avoided in show_mem().

Wrong.  For flatmem, we have a pfn_valid() which is backed by doing a
one, two or maybe rarely three compare search of the memblocks.  Short
of having a bitmap of every page in the 4GB memory space, you can't
get more efficient than that.

For sparsemem, sparsemem provides its own pfn_valid() which is _far_
from what we require:

static inline int pfn_valid(unsigned long pfn)
{
        if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
                return 0;
        return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
}

> Maybe we should add either the MAX_ORDER or section_nr() trick to the
> lib/ implementation.  I bet that would use pfn_valid() rarely enough to
> meet any performance concerns.

No.  I think it's entirely possible that on some platforms we have holes
within sections.  Like I said, ours is more funky than x86.

The big problem we have on ARM is that the kernel sparsemem stuff doesn't
*actually* support sparse memory.  What it supports is fully populated
blocks of memory of fixed size (known at compile time) where some blocks
may be contiguous.  Blocks are assumed to be populated from physical
address zero.

It doesn't actually support partially populated blocks.

So, if your memory granule size is 4MB, and your memory starts at
0xc0000000 physical, you're stuck with 768 unused sparsemem blocks
at the beginning of memory.  If it's 1MB, then you have 3072 unused
sparsemem blocks.  Each mem_section structure is 8 bytes, so that
could be 24K of zeros.

What we actually need is infrastructure in the kernel which can properly
handle sparse memory efficiently without causing such wastage.  If your
platform has four memory chunks, eg at 0xc0000000, 0xc4000000, 0xc8000000,
and 0xcc000000, then you want to build the kernel to tell it "there may
be four chunks with a 64MB offset, each chunk may be partially populated."

It seems that Sparsemem can't do that efficiently.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
  2011-01-24 17:58           ` Russell King - ARM Linux
@ 2011-01-24 18:47             ` Dave Hansen
  2011-01-25  0:33             ` KyongHo Cho
  1 sibling, 0 replies; 19+ messages in thread
From: Dave Hansen @ 2011-01-24 18:47 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: KyongHo Cho, Kukjin Kim, KeyYoung Park, linux-kernel, Ilho Lee,
	linux-mm, linux-samsung-soc, linux-arm-kernel

On Mon, 2011-01-24 at 17:58 +0000, Russell King - ARM Linux wrote:
> Wrong.  For flatmem, we have a pfn_valid() which is backed by doing a
> one, two or maybe rarely three compare search of the memblocks.  Short
> of having a bitmap of every page in the 4GB memory space, you can't
> get more efficient than that.

Sweet.  So, we can just take the original patch that started this
conversation, add the requisite pfn_valid()s and pfn_to_page()s, and
skip the sparsemem #ifdefs.  Right?

-- Dave


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] ARM: mm: Regarding section when dealing with meminfo
  2011-01-24 17:58           ` Russell King - ARM Linux
  2011-01-24 18:47             ` Dave Hansen
@ 2011-01-25  0:33             ` KyongHo Cho
  1 sibling, 0 replies; 19+ messages in thread
From: KyongHo Cho @ 2011-01-25  0:33 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Dave Hansen, Kukjin Kim, KeyYoung Park, linux-kernel, Ilho Lee,
	linux-mm, linux-samsung-soc, linux-arm-kernel

On Tue, Jan 25, 2011 at 2:58 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Mon, Jan 24, 2011 at 08:52:17AM -0800, Dave Hansen wrote:
>> On Sun, 2011-01-23 at 18:05 +0000, Russell King - ARM Linux wrote:
>> > On Thu, Jan 20, 2011 at 10:11:27AM -0800, Dave Hansen wrote:
>> > > On Thu, 2011-01-20 at 18:01 +0000, Russell King - ARM Linux wrote:
>> > > > > The x86 version of show_mem() actually manages to do this without any
>> > > > > #ifdefs, and works for a ton of configuration options.  It uses
>> > > > > pfn_valid() to tell whether it can touch a given pfn.
>> > > >
>> > > > x86 memory layout tends to be very simple as it expects memory to
>> > > > start at the beginning of every region described by a pgdat and extend
>> > > > in one contiguous block.  I wish ARM was that simple.
>> > >
>> > > x86 memory layouts can be pretty funky and have been that way for a long
>> > > time.  That's why we *have* to handle holes in x86's show_mem().  My
>> > > laptop even has a ~1GB hole in its ZONE_DMA32:
>> >
>> > If x86 is soo funky, I suggest you try the x86 version of show_mem()
>> > on an ARM platform with memory holes.  Make sure you try it with
>> > sparsemem as well...
>>
>> x86 uses the generic lib/ show_mem().  It works for any holes, as long
>> as they're expressed in one of the memory models so that pfn_valid()
>> notices them.
>
> I think that's what I said.
>
>> ARM looks like its pfn_valid() is backed up by searching the (ASM
>> arch-specific) memblocks.  That looks like it would be fairly slow
>> compared to the other pfn_valid() implementations and I can see why it's
>> being avoided in show_mem().
>
> Wrong.  For flatmem, we have a pfn_valid() which is backed by doing a
> one, two or maybe rarely three compare search of the memblocks.  Short
> of having a bitmap of every page in the 4GB memory space, you can't
> get more efficient than that.
>
> For sparsemem, sparsemem provides its own pfn_valid() which is _far_
> from what we require:
>
> static inline int pfn_valid(unsigned long pfn)
> {
>        if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>                return 0;
>        return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> }
>
>> Maybe we should add either the MAX_ORDER or section_nr() trick to the
>> lib/ implementation.  I bet that would use pfn_valid() rarely enough to
>> meet any performance concerns.
>
> No.  I think it's entirely possible that on some platforms we have holes
> within sections.  Like I said, ours is more funky than x86.
>
I don't think the improving performance can result the possibility of
misbehavior.

> The big problem we have on ARM is that the kernel sparsemem stuff doesn't
> *actually* support sparse memory.  What it supports is fully populated
> blocks of memory of fixed size (known at compile time) where some blocks
> may be contiguous.  Blocks are assumed to be populated from physical
> address zero.
>
> It doesn't actually support partially populated blocks.
>
> So, if your memory granule size is 4MB, and your memory starts at
> 0xc0000000 physical, you're stuck with 768 unused sparsemem blocks
> at the beginning of memory.  If it's 1MB, then you have 3072 unused
> sparsemem blocks.  Each mem_section structure is 8 bytes, so that
> could be 24K of zeros.
>
> What we actually need is infrastructure in the kernel which can properly
> handle sparse memory efficiently without causing such wastage.  If your
> platform has four memory chunks, eg at 0xc0000000, 0xc4000000, 0xc8000000,
> and 0xcc000000, then you want to build the kernel to tell it "there may
> be four chunks with a 64MB offset, each chunk may be partially populated."
>
> It seems that Sparsemem can't do that efficiently.
>

If sparsemem is not the correct choice for ARM, why don't we go back
to the  flatmem? Still, the flatmem has problem on wastage of memory
because of memory holes. But it is more reliable for us, at least.
I think the idea of sparsemem is not bad for ARM, though. The
implementation is quite efficient. The problem is that we still
believe that sparsemem needs more verification to prove its
robustness.

Anyway, you told that we need to define NR_BANKS more that 8 to use
larger memory than 2gb without worry about misbehavior in mem_init()
and mem_show(). As i said before, I think that it is not reasonable to
create a number of memory chunks to avoid the problem. Nowhere in the
kernel code and descriptions informs that contiguous physical memory
chunks must not cross sparsemem section's boundaries.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-01-25  0:34 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-20  9:45 [PATCH] ARM: mm: Regarding section when dealing with meminfo KyongHo Cho
2011-01-20 14:28 ` Minchan Kim
2011-01-20 17:25   ` Dave Hansen
     [not found]   ` <AANLkTinXAiShaf1f69ufVHg7KPaY5j=jmOTtK71GNNp5@mail.gmail.com>
2011-01-20 17:43     ` Minchan Kim
2011-01-20 17:44       ` Minchan Kim
2011-01-20 17:52         ` KyongHo Cho
2011-01-20 17:20 ` Dave Hansen
     [not found]   ` <AANLkTi=nsAOtLPK75Wy5Rm8pfWob8xTP5259DyYuxR9J@mail.gmail.com>
2011-01-20 17:48     ` KyongHo Cho
2011-01-20 18:04     ` Dave Hansen
2011-01-20 18:01   ` Russell King - ARM Linux
2011-01-20 18:11     ` Dave Hansen
2011-01-23 18:05       ` Russell King - ARM Linux
2011-01-24 16:52         ` Dave Hansen
2011-01-24 17:58           ` Russell King - ARM Linux
2011-01-24 18:47             ` Dave Hansen
2011-01-25  0:33             ` KyongHo Cho
2011-01-21  2:12     ` KyongHo Cho
2011-01-21 10:38       ` Russell King - ARM Linux
2011-01-21 11:15         ` KyongHo Cho

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).