LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: [PATCH] shrink hash sizes on small machines, take 2
@ 2004-05-18 12:32 Gerald Schaefer
       [not found] ` <20040518174210.GD28735@waste.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Gerald Schaefer @ 2004-05-18 12:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: mpm

On Sat, Apr 10, 2004, Matt Mackall wrote:
> Base hash sizes on available memory rather than total memory. An
> additional 50% above current used memory is considered reserved for
> the purposes of hash sizing to compensate for the hashes themselves
> and the remainder of kernel and userspace initialization.
> 
> Index: tiny/fs/dcache.c
> ===================================================================
> --- tiny.orig/fs/dcache.c	2004-03-25 13:36:09.000000000 -0600
> +++ tiny/fs/dcache.c	2004-04-10 18:14:42.000000000 -0500
> @@ -28,6 +28,7 @@
>  #include <asm/uaccess.h>
>  #include <linux/security.h>
>  #include <linux/seqlock.h>
> +#include <linux/swap.h>
>  
>  #define DCACHE_PARANOIA 1
>  /* #define DCACHE_DEBUG 1 */
> @@ -1619,13 +1620,21 @@
> 
>  void __init vfs_caches_init(unsigned long mempages)
>  {
> -	names_cachep = kmem_cache_create("names_cache", 
> -			PATH_MAX, 0, 
> +	unsigned long reserve;
> +
> +	/* Base hash sizes on available memory, with a reserve equal to
> +           150% of current kernel size */
> +
> +	reserve = (mempages - nr_free_pages()) * 3/2;
> +	mempages -= reserve;

This calculation doesn't sound right. The value of mempages is set to
num_physpages from the calling function start_kernel(), which also
includes pages from a potential "mem=" kernel parameter (which we use
for reserving memory to load DCSS segments on z/VM).

Whenever the amount of reserved pages goes above some limit (for whatever
reason, not only with "mem="), this calculation results in a negative value
for mempages, respectively a very large value due to "unsigned long".

I am not on the mailing list, so please put me on cc if you answer.

--
Gerald

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] shrink hash sizes on small machines, take 2
       [not found] ` <20040518174210.GD28735@waste.org>
@ 2004-05-19 17:40   ` Gerald Schaefer
  0 siblings, 0 replies; 4+ messages in thread
From: Gerald Schaefer @ 2004-05-19 17:40 UTC (permalink / raw)
  To: Matt Mackall; +Cc: akpm, arnd, schwidefsky, linux-kernel

On Tuesday 18 May 2004 19:42, you wrote:
> num_physpages should represent the memory available to the kernel for
> normal use and should not reflect any other reserved memory. Otherwise
> it'll unduly influence the hash sizes both with and without this
> patch.
This is a very good point, Arnd and I have been looking at our "mem=" hack
and it does look a little bit fishy...
There are possibly more things affected in the s390 kernel with "mem="
parameter, not only num_physpages (max_low_pfn, etc.), so we will have to
take a closer look and see what Martin thinks about it.

> Index: mm/fs/dcache.c
> ===================================================================
> --- mm.orig/fs/dcache.c       2004-05-18 12:29:28.000000000 -0500
> +++ mm/fs/dcache.c    2004-05-18 12:37:42.000000000 -0500
> @@ -1625,7 +1625,7 @@
>       /* Base hash sizes on available memory, with a reserve equal to
>             150% of current kernel size */
>  
> -     reserve = (mempages - nr_free_pages()) * 3/2;
> +     reserve = min((mempages - nr_free_pages()) * 3/2, mempages - 1);
>       mempages -= reserve;
>  
>       names_cachep = kmem_cache_create("names_cache",
This patch is o.k. for us, in our case (with "mem=") it would more or less
restore the previous situation (without the calculation, but still with a
potential memory problem on our side)

--
Gerald

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] shrink hash sizes on small machines, take 2
  2004-04-10 23:27 Matt Mackall
@ 2004-04-15 15:16 ` Marcelo Tosatti
  0 siblings, 0 replies; 4+ messages in thread
From: Marcelo Tosatti @ 2004-04-15 15:16 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Andrew Morton, arjanv, linux-kernel

On Sat, Apr 10, 2004 at 06:27:07PM -0500, Matt Mackall wrote:
> The following attempts to cleanly address the low end of the problem,
> something like my calc_hash_order or Marcelo's approach should be used
> to attack the upper end of the problem.
> 
> 8<
> 
> Shrink hashes on small systems
> 
> Base hash sizes on available memory rather than total memory. An
> additional 50% above current used memory is considered reserved for
> the purposes of hash sizing to compensate for the hashes themselves
> and the remainder of kernel and userspace initialization.

Hi Matt, 

As far as I remember from my tests booting with 8MB yields 0-order (one page) 
dentry/inode hash tables, and 16MB yields 
1-order dentry/0-order inode hash. 

So we can't go lower than 1 page on <8MB anyway (and we dont). What 
is the problem you are seeing ?

Your patch changes 16MB to 0-order dentry hashtable?

On the higher end, we still need to figure out if the "huge"
hash tables (1MB dentry/512K inode on 4GB box, upto 8MB hash dentry 
on 16GB box) are really worth it. 

Arjan seems to be clipping the dentry to 128K on RH's kernels. 
I couldnt find much of a difference on dbench performance from 1MB to 512K 
or 128K dhash. Someone willing to help with SDET or different tests?

Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] shrink hash sizes on small machines, take 2
@ 2004-04-10 23:27 Matt Mackall
  2004-04-15 15:16 ` Marcelo Tosatti
  0 siblings, 1 reply; 4+ messages in thread
From: Matt Mackall @ 2004-04-10 23:27 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel

The following attempts to cleanly address the low end of the problem,
something like my calc_hash_order or Marcelo's approach should be used
to attack the upper end of the problem.

8<

Shrink hashes on small systems

Base hash sizes on available memory rather than total memory. An
additional 50% above current used memory is considered reserved for
the purposes of hash sizing to compensate for the hashes themselves
and the remainder of kernel and userspace initialization.

Index: tiny/fs/dcache.c
===================================================================
--- tiny.orig/fs/dcache.c	2004-03-25 13:36:09.000000000 -0600
+++ tiny/fs/dcache.c	2004-04-10 18:14:42.000000000 -0500
@@ -28,6 +28,7 @@
 #include <asm/uaccess.h>
 #include <linux/security.h>
 #include <linux/seqlock.h>
+#include <linux/swap.h>
 
 #define DCACHE_PARANOIA 1
 /* #define DCACHE_DEBUG 1 */
@@ -1619,13 +1620,21 @@
 
 void __init vfs_caches_init(unsigned long mempages)
 {
-	names_cachep = kmem_cache_create("names_cache", 
-			PATH_MAX, 0, 
+	unsigned long reserve;
+
+	/* Base hash sizes on available memory, with a reserve equal to
+           150% of current kernel size */
+
+	reserve = (mempages - nr_free_pages()) * 3/2;
+	mempages -= reserve;
+
+	names_cachep = kmem_cache_create("names_cache",
+			PATH_MAX, 0,
 			SLAB_HWCACHE_ALIGN, NULL, NULL);
 	if (!names_cachep)
 		panic("Cannot create names SLAB cache");
 
-	filp_cachep = kmem_cache_create("filp", 
+	filp_cachep = kmem_cache_create("filp",
 			sizeof(struct file), 0,
 			SLAB_HWCACHE_ALIGN, filp_ctor, filp_dtor);
 	if(!filp_cachep)
@@ -1633,7 +1642,7 @@
 
 	dcache_init(mempages);
 	inode_init(mempages);
-	files_init(mempages); 
+	files_init(mempages);
 	mnt_init(mempages);
 	bdev_cache_init();
 	chrdev_init();

-- 
Matt Mackall : http://www.selenic.com : Linux development and consulting

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-05-22  2:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-18 12:32 [PATCH] shrink hash sizes on small machines, take 2 Gerald Schaefer
     [not found] ` <20040518174210.GD28735@waste.org>
2004-05-19 17:40   ` Gerald Schaefer
  -- strict thread matches above, loose matches on Subject: below --
2004-04-10 23:27 Matt Mackall
2004-04-15 15:16 ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).