LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH][BUGFIX][RFC] fix soft lock up at NFS mount by making limitation of dentry_unused
@ 2008-03-06  4:41 Kentaro Makita
  2008-03-06  5:54 ` David Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Kentaro Makita @ 2008-03-06  4:41 UTC (permalink / raw)
  To: linux-kernel

[Summary]
  Make a limitation of dentry_unused to avoid soft lock up at NFS mounts
 and remounting any filesystem.

[Descriptions]
- background
 dentry_unused is a list of dentries which is not in use. This works
 as a cache against not-exisiting files. dentry_unused grows up when
 directories or files are removed. This list can be *verrry* long
 if there is no memory  pressure, because there is no limit.

- what's problem
 When prune_dcache() is called, it scans *all* dentry_unused linearly
 under spin_lock(). This scan costs very much if there are many entries.
 For example, prune_dcache() is called at mounting NFS.
 In our test, when there are 100,000,000 of unused dentries, mounting
 NFS took 1 minutes and almost all user programs hang during it.

  100,000,000 is possible number on large systems.

 This problem already happend on our system.
 Therefore, we need a limitation of dentry_unused.

- How to fix
 Limit number of unused dentries to suitable value.

 Threshold is as follows:
 dentry_unused_ratio: default value is 10000(%). If the amount of
 dentry_unused reaches to 10000% of the amount of dentry_in_use,
 5% of them are freed.

 I feel we need more tests to determine resonable value to any system.
 So, please test.

 This patch is based on linux-2.6.25-rc4.

-Test Results

 Result on 24GB boxes with excessive unused dentries.
Without patch:
# cat /proc/sys/fs/dentry-state
103327453       103313783       45      0       0
# time mount -t nfs 192.168.0.2:/export /mnt
real    1m4.698s
user    0m0.000s
sys     1m4.672s

 With this patch:
# cat /proc/sys/fs/dentry-state
118681  117225  45      0       0       0
# time mount -t nfs 192.168.0.2:/export /mnt
real    0m0.103s
user    0m0.004s
sys     0m0.076s

Tested on Intel Itanium 2 9050 (dualcore) x12 MEM 24GB , kernel-2.6.25-rc4
I found no peformance regression in my tests.


Best Regards,
Kentaro Makita

Signed-off-by: Kentaro Makita <k-makita@np.css.fujitsu.com>
---
 fs/dcache.c |    7 +++++++
 1 files changed, 7 insertions(+)
diff -rupN -X linux-2.6.25-rc4/Documentation/dontdiff linux-2.6.25-rc4/fs/dcache.c linux-2.6.25-rc4mod/fs/dcache.c
--- linux-2.6.25-rc4/fs/dcache.c	2008-03-05 13:33:54.000000000 +0900
+++ linux-2.6.25-rc4mod/fs/dcache.c	2008-03-05 16:47:18.000000000 +0900
@@ -42,6 +42,8 @@ __cacheline_aligned_in_smp DEFINE_SEQLOC

 EXPORT_SYMBOL(dcache_lock);

+/* threshold to limit dentry_unused */
+unsigned int dentry_unused_ratio = 10000;
 static struct kmem_cache *dentry_cache __read_mostly;

 #define DNAME_INLINE_LEN (sizeof(struct dentry)-offsetof(struct dentry,d_iname))
@@ -61,6 +63,7 @@ static unsigned int d_hash_mask __read_m
 static unsigned int d_hash_shift __read_mostly;
 static struct hlist_head *dentry_hashtable __read_mostly;
 static LIST_HEAD(dentry_unused);
+static void prune_dcache(int count, struct super_block *sb);

 /* Statistics gathering. */
 struct dentry_stat_t dentry_stat = {
@@ -214,6 +217,10 @@ repeat:
   	}
  	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
+	/* Prune unused dentry over threshold level */
+	int nr_in_use = (dentry_stat.nr_dentry - dentry_stat.nr_unused);
+	if (dentry_stat.nr_dentry > nr_in_use * dentry_unused_ratio / 100)
+		prune_dcache(dentry_stat.nr_unused * 5 / 100 , NULL);
 	return;

 unhash_it:

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][BUGFIX][RFC] fix soft lock up at NFS mount by making limitation of dentry_unused
  2008-03-06  4:41 [PATCH][BUGFIX][RFC] fix soft lock up at NFS mount by making limitation of dentry_unused Kentaro Makita
@ 2008-03-06  5:54 ` David Chinner
  2008-03-06  7:15   ` Kentaro Makita
  0 siblings, 1 reply; 6+ messages in thread
From: David Chinner @ 2008-03-06  5:54 UTC (permalink / raw)
  To: Kentaro Makita; +Cc: linux-kernel

On Thu, Mar 06, 2008 at 01:41:29PM +0900, Kentaro Makita wrote:
> [Summary]
>   Make a limitation of dentry_unused to avoid soft lock up at NFS mounts
>  and remounting any filesystem.
> 
> [Descriptions]
> - background
>  dentry_unused is a list of dentries which is not in use. This works
>  as a cache against not-exisiting files. dentry_unused grows up when
>  directories or files are removed. This list can be *verrry* long
>  if there is no memory  pressure, because there is no limit.
> 
> - what's problem
>  When prune_dcache() is called, it scans *all* dentry_unused linearly
>  under spin_lock(). This scan costs very much if there are many entries.
>  For example, prune_dcache() is called at mounting NFS.
>  In our test, when there are 100,000,000 of unused dentries, mounting
>  NFS took 1 minutes and almost all user programs hang during it.
> 
>   100,000,000 is possible number on large systems.
> 
>  This problem already happend on our system.
>  Therefore, we need a limitation of dentry_unused.

No, we need a smarter free list structure. There have been several attempts
at this in the past. Two that I can recall off the top of my head:

	- per node unused LRUs
	- per superblock unusued LRUs

I guess we need to revisit this again, because limiting the size of
the cache like this is not an option.

>  I feel we need more tests to determine resonable value to any system.
>  So, please test.
.....
> Tested on Intel Itanium 2 9050 (dualcore) x12 MEM 24GB , kernel-2.6.25-rc4
> I found no peformance regression in my tests.

Try something that relies on leaving the working set on the unused
list, like NFS server benchmarks that have a working set of tens of
million of files....

> Signed-off-by: Kentaro Makita <k-makita@np.css.fujitsu.com>
> ---
>  fs/dcache.c |    7 +++++++
>  1 files changed, 7 insertions(+)
> diff -rupN -X linux-2.6.25-rc4/Documentation/dontdiff linux-2.6.25-rc4/fs/dcache.c linux-2.6.25-rc4mod/fs/dcache.c
> --- linux-2.6.25-rc4/fs/dcache.c	2008-03-05 13:33:54.000000000 +0900
> +++ linux-2.6.25-rc4mod/fs/dcache.c	2008-03-05 16:47:18.000000000 +0900
> @@ -42,6 +42,8 @@ __cacheline_aligned_in_smp DEFINE_SEQLOC
> 
>  EXPORT_SYMBOL(dcache_lock);
> 
> +/* threshold to limit dentry_unused */
> +unsigned int dentry_unused_ratio = 10000;
>  static struct kmem_cache *dentry_cache __read_mostly;
> 
>  #define DNAME_INLINE_LEN (sizeof(struct dentry)-offsetof(struct dentry,d_iname))
> @@ -61,6 +63,7 @@ static unsigned int d_hash_mask __read_m
>  static unsigned int d_hash_shift __read_mostly;
>  static struct hlist_head *dentry_hashtable __read_mostly;
>  static LIST_HEAD(dentry_unused);
> +static void prune_dcache(int count, struct super_block *sb);
> 
>  /* Statistics gathering. */
>  struct dentry_stat_t dentry_stat = {
> @@ -214,6 +217,10 @@ repeat:
>    	}
>   	spin_unlock(&dentry->d_lock);
>  	spin_unlock(&dcache_lock);
> +	/* Prune unused dentry over threshold level */
> +	int nr_in_use = (dentry_stat.nr_dentry - dentry_stat.nr_unused);
> +	if (dentry_stat.nr_dentry > nr_in_use * dentry_unused_ratio / 100)
> +		prune_dcache(dentry_stat.nr_unused * 5 / 100 , NULL);

nr_in_use is going to overflow 32 bits with this calculation.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][BUGFIX][RFC] fix soft lock up at NFS mount by making limitation of dentry_unused
  2008-03-06  5:54 ` David Chinner
@ 2008-03-06  7:15   ` Kentaro Makita
  2008-03-08  8:33     ` KOSAKI Motohiro
  0 siblings, 1 reply; 6+ messages in thread
From: Kentaro Makita @ 2008-03-06  7:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: dgc

David Chinner wrote:
> On Thu, Mar 06, 2008 at 01:41:29PM +0900, Kentaro Makita wrote:
....
>>   100,000,000 is possible number on large systems.
>>
>>  This problem already happend on our system.
>>  Therefore, we need a limitation of dentry_unused.
> 
> No, we need a smarter free list structure. There have been several attempts
> at this in the past. Two that I can recall off the top of my head:
> 
> 	- per node unused LRUs
> 	- per superblock unusued LRUs

 I know there is such attempt already, but they are not in main-line.
 I think this is not a smart but simple way to avoid this ploblem.
> 
> I guess we need to revisit this again, because limiting the size of
> the cache like this is not an option.
> 
>>  I feel we need more tests to determine resonable value to any system.
>>  So, please test.
> .....
>> Tested on Intel Itanium 2 9050 (dualcore) x12 MEM 24GB , kernel-2.6.25-rc4
>> I found no peformance regression in my tests.
> 
> Try something that relies on leaving the working set on the unused
> list, like NFS server benchmarks that have a working set of tens of
> million of files....
> 
 Okay, I'll try some benchmarks and report results...
>> Signed-off-by: Kentaro Makita <k-makita@np.css.fujitsu.com>
>> ---
>>  fs/dcache.c |    7 +++++++
>>  1 files changed, 7 insertions(+)
>> diff -rupN -X linux-2.6.25-rc4/Documentation/dontdiff linux-2.6.25-rc4/fs/dcache.c linux-2.6.25-rc4mod/fs/dcache.c
>> --- linux-2.6.25-rc4/fs/dcache.c	2008-03-05 13:33:54.000000000 +0900
>> +++ linux-2.6.25-rc4mod/fs/dcache.c	2008-03-05 16:47:18.000000000 +0900
......
>> @@ -214,6 +217,10 @@ repeat:
>>    	}
>>   	spin_unlock(&dentry->d_lock);
>>  	spin_unlock(&dcache_lock);
>> +	/* Prune unused dentry over threshold level */
>> +	int nr_in_use = (dentry_stat.nr_dentry - dentry_stat.nr_unused);
>> +	if (dentry_stat.nr_dentry > nr_in_use * dentry_unused_ratio / 100)
>> +		prune_dcache(dentry_stat.nr_unused * 5 / 100 , NULL);
> 
> nr_in_use is going to overflow 32 bits with this calculation.
 Oh, I simply mistake. I fix it at this post.
> 
> Cheers,
> 
> Dave.
Best Regards,
Kentaro Makita

Signed-off-by: Kentaro Makita <k-makita@np.css.fujitsu.com>
---
 dcache.c |    7 +++++++
 1 files changed, 7 insertions(+)
diff -rupN -X linux-2.6.25-rc4/Documentation/dontdiff linux-2.6.25-rc4/fs/dcache.c linux-2.6.25-rc4mod/fs/dcache.c
--- linux-2.6.25-rc4/fs/dcache.c	2008-03-05 13:33:54.000000000 +0900
+++ linux-2.6.25-rc4mod/fs/dcache.c	2008-03-06 15:27:22.000000000 +0900
@@ -42,6 +42,8 @@ __cacheline_aligned_in_smp DEFINE_SEQLOC

 EXPORT_SYMBOL(dcache_lock);

+/* threshold to limit dentry_unused */
+unsigned int dentry_unused_ratio = 10000;
 static struct kmem_cache *dentry_cache __read_mostly;

 #define DNAME_INLINE_LEN (sizeof(struct dentry)-offsetof(struct dentry,d_iname))
@@ -61,6 +63,7 @@ static unsigned int d_hash_mask __read_m
 static unsigned int d_hash_shift __read_mostly;
 static struct hlist_head *dentry_hashtable __read_mostly;
 static LIST_HEAD(dentry_unused);
+static void prune_dcache(int count, struct super_block *sb);

 /* Statistics gathering. */
 struct dentry_stat_t dentry_stat = {
@@ -214,6 +217,10 @@ repeat:
   	}
  	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
+	/* Prune unused dentry over threshold level */
+	int nr_in_use = (dentry_stat.nr_dentry - dentry_stat.nr_unused);
+	if (dentry_stat.nr_dentry > nr_in_use * (dentry_unused_ratio / 100))
+		prune_dcache(dentry_stat.nr_unused * 5 / 100 , NULL);
 	return;

 unhash_it:

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][BUGFIX][RFC] fix soft lock up at NFS mount by making limitation of dentry_unused
  2008-03-06  7:15   ` Kentaro Makita
@ 2008-03-08  8:33     ` KOSAKI Motohiro
  2008-03-14  5:15       ` Kentaro Makita
  0 siblings, 1 reply; 6+ messages in thread
From: KOSAKI Motohiro @ 2008-03-08  8:33 UTC (permalink / raw)
  To: Kentaro Makita; +Cc: kosaki.motohiro, linux-kernel, dgc

Hi makita-san

in general, I agreed with many people disallow >1min hang up.

> > No, we need a smarter free list structure. There have been several attempts
> > at this in the past. Two that I can recall off the top of my head:
> > 
> > 	- per node unused LRUs
> > 	- per superblock unusued LRUs
> 
>  I know there is such attempt already, but they are not in main-line.
>  I think this is not a smart but simple way to avoid this ploblem.

I think 2 improvement is not exclusive.
your patch is nice, but we need David's patch too. 

because 2 patch purpose is different.

per superblock lru:   improve typical performance.
limit of unused list: prevent too long hang up.

many time hang up happend at worst case even introduce per superblock lru.
and
unused list traversal doesn't improvement even introduce limit of unused list.

I hope both.


> >> Tested on Intel Itanium 2 9050 (dualcore) x12 MEM 24GB , kernel-2.6.25-rc4
> >> I found no peformance regression in my tests.
> > 
> > Try something that relies on leaving the working set on the unused
> > list, like NFS server benchmarks that have a working set of tens of
> > million of files....
> > 
>  Okay, I'll try some benchmarks and report results...

good luck.


>   	spin_unlock(&dentry->d_lock);
>  	spin_unlock(&dcache_lock);
> +	/* Prune unused dentry over threshold level */
> +	int nr_in_use = (dentry_stat.nr_dentry - dentry_stat.nr_unused);
> +	if (dentry_stat.nr_dentry > nr_in_use * (dentry_unused_ratio / 100))
> +		prune_dcache(dentry_stat.nr_unused * 5 / 100 , NULL);
>  	return;

Why don't you make sysctl adjustable interface of dentry_unused_ratio?


- kosaki



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][BUGFIX][RFC] fix soft lock up at NFS mount by making limitation of dentry_unused
  2008-03-08  8:33     ` KOSAKI Motohiro
@ 2008-03-14  5:15       ` Kentaro Makita
  2008-03-14  6:43         ` David Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Kentaro Makita @ 2008-03-14  5:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: dgc

[-- Attachment #1: Type: text/plain, Size: 1202 bytes --]

Hi David
On Thu, 6 Mar 2008 16:54:16 +1100  David Chinner wrote:
>> No, we need a smarter free list structure. There have been several attempts
>> at this in the past. Two that I can recall off the top of my head:
>>
>> 	- per node unused LRUs
>> 	- per superblock unusued LRUs
>> I guess we need to revisit this again, because limiting the size of
>> the cache like this is not an option.
I 'm interesting in your patch. I 'll test two patches above if there
is newer version based on latest kernel.

>> Try something that relies on leaving the working set on the unused
>> list, like NFS server benchmarks that have a working set of tens of
>> million of files....
>>
I tested following, and I found no regressions except one case.
 - kernbench-0.24 on local ext3 and nfs
 - dbench-3.04 on local ext3 and nfs
 - IOzone-3.291 on local ext3 and nfs
-Basic file operations (create/delete/list/copy/move) on local ext3 and nfs

but I found one performance regression in my patch at following case.
 - On local ext3, remove 1,000,000 files in a directory spends 23% more time.
 (18m34.901s to 21m55.047s)

I 'm trying to fix it and post again.
Thank you for your suggestion.

Best Regards,
Kentaro Makita

[-- Attachment #2: regression test results --]
[-- Type: text/plain, Size: 5476 bytes --]

-------------------------------------------------------------------------------
Basic file operations :
w/o  patch on local ext3:
target \  operations   | create     |  delete    |  list    |  copy      |  move
-----------------------+----------------------------------------------------------------------------------
1000 dirs x 1000 files | 22m6.930s  | 0m32.682s  | 0m0.037s | 1m31.506s  | 0m2.154s
1000000 files          | 22m37.759s | 18m34.901s | 0m0.002s | 19m24.388s | 0m0.156s
(elapsed time : second(s))

with patch on local ext3:
target \   operations  | create     |  delete    |  list    |  copy       |  move
-----------------------+---------------------------------------------------------------------------------
1000 dirs x 1000 files | 21m54.470s | 0m32.040s  | 0m0.008s |  1m30.796s  | 0m2.943s
1000000 files          | 22m8.381s  | 21m55.047s | 0m0.020s |  21m25.779s | 0m0.052s
(elapsed time : second(s))

w/o  patch on nfs:
target \   operations   | create     |  delete     |  list     |  copy      |  move
------------------------+----------------------------------------------------------------------------------
1000000 files           | 140m7.649s | 293m46.285s | 0m0.098s  | 432m7.720s | 0m0.674s
(elapsed time : second(s))

with patch on nfs:
target \   operations   | create      |  delete     |  list    |  copy       |  move
------------------------+--------------------------------------------------------------------------------
1000000 files           | 141m53.534s | 290m17.669s | 0m0.040s | 440m51.964s | 0m0.361s
(elapsed time : second(s))

IOzone:
# ./iozone -Ra > logfile
on ext3:		
			   bytes / sec (Average)
                	  w/o patch	 with patch
Writer Report   	499,136   	502,536   	100.68%
Re-writer Report	1,774,772   	1,790,133   	100.87%
Reader Report   	3,761,592   	3,818,147   	101.50%
Re-reader Report	5,723,402   	6,020,088   	105.18%
Random Read Report	5,343,096   	5,588,652   	104.60%
Random Write Report	2,054,678   	2,102,237   	102.31%
Backward Read Report	3,628,740   	3,696,570   	101.87%
Record Rewrite Report	3,697,344   	3,760,118   	101.70%
Stride Read Report	4,899,821   	5,053,645   	103.14%
Fwrite Report   	493,434   	493,464   	100.01%
Re-fwrite Report	1,505,555   	1,516,702   	100.74%
Fread Report    	3,330,627   	3,363,825   	101.00%
Re-fread Report 	5,404,997   	5,572,977   	103.11%

on nfs:		
			   bytes / sec (Average)
                	  w/o patch	 with patch
Writer Report   	2,397,539   	2,495,369   	104.08%
Re-writer Report	2,534,827   	2,539,019   	100.17%
Reader Report   	3,692,377   	3,711,528   	100.52%
Re-reader Report	5,783,150   	5,745,256   	99.34%
Random Read Report	5,569,286   	5,663,204   	101.69%
Random Write Report	2,982,048   	2,988,895   	100.23%
Backward Read Report	3,694,922   	3,710,797   	100.43%
Record Rewrite Report	5,844,580   	5,873,414   	100.49%
Stride Read Report	5,043,812   	5,060,472   	100.33%
Fwrite Report   	1,769,812   	1,788,991   	101.08%
Re-fwrite Report	1,964,384   	1,978,361   	100.71%
Fread Report    	3,362,162   	3,293,340   	97.95%
Re-fread Report 	5,441,776   	5,441,807   	100.00%

kernbench-0.42:
# kernbench -M
w/o  patch on local ext3:
2.6.25-rc5
Average Half load -j 12 Run (std deviation):
Elapsed Time 105.354 (0.608383)
User Time 1072.59 (1.42999)
System Time 68.406 (0.540074)
Percent CPU 1082.4 (5.17687)
Context Switches 75067.2 (2425.63)
Sleeps 155188 (2167.44)

Average Optimal load -j 96 Run (std deviation):
Elapsed Time 69.028 (0.523374)
User Time 1106.83 (36.1126)
System Time 67.735 (0.82922)
Percent CPU 1416 (351.761)
Context Switches 105700 (32397.8)
Sleeps 161568 (7136.89)

with patch on local ext3:
2.6.25-rc5dentry
Average Half load -j 12 Run (std deviation):
Elapsed Time 104.962 (0.0630079)
User Time 1071.74 (0.374993)
System Time 68.578 (0.301032)
Percent CPU 1086 (0.707107)
Context Switches 77173.8 (513.063)
Sleeps 156710 (669.205)

Average Optimal load -j 96 Run (std deviation):
Elapsed Time 68.826 (0.942804)
User Time 1107.5 (37.7007)
System Time 67.901 (0.770086)
Percent CPU 1422.2 (354.748)
Context Switches 107092 (31559.1)
Sleeps 161884 (6220.1)

w/o  patch on nfs:
2.6.25-rc5
Average Half load -j 12 Run (std deviation):
Elapsed Time 237.71 (6.4713)
User Time 1087.07 (1.42099)
System Time 190.306 (0.941637)
Percent CPU 537.2 (15.0233)
Context Switches 358822 (8395.04)
Sleeps 4.46148e+06 (53959.4)

Average Optimal load -j 96 Run (std deviation):
Elapsed Time 286.312 (4.8972)
User Time 1127.59 (42.7355)
System Time 304.32 (120.184)
Percent CPU 545.5 (14.6382)
Context Switches 603299 (257858)
Sleeps 9.21507e+06 (5.01086e+06)

with patch on nfs:
2.6.25-rc5dentry
Average Half load -j 12 Run (std deviation):
Elapsed Time 257.704 (8.20142)
User Time 1087.19 (0.992084)
System Time 191.294 (1.11267)
Percent CPU 496 (15.5885)
Context Switches 356975 (14893.6)
Sleeps 4.42764e+06 (68507.4)

Average Optimal load -j 96 Run (std deviation):
Elapsed Time 293.448 (2.64979)
User Time 1127.5 (42.5004)
System Time 308.478 (123.531)
Percent CPU 519.3 (26.9281)
Context Switches 601352 (258290)
Sleeps 9.2956e+06 (5.13148e+06)

dbench-3.04:
(on local and nfs directories)
# dbench 100

w/o patch on local ext3:
Throughput 186.4 MB/sec 100 procs

with patch on local ext3:
Throughput 215.831 MB/sec 100 procs

w/o patch on nfs:
Throughput 3.13253 MB/sec 100 procs

with patch on nfs:
Throughput 3.37892 MB/sec 100 procs
-----------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][BUGFIX][RFC] fix soft lock up at NFS mount by making limitation of dentry_unused
  2008-03-14  5:15       ` Kentaro Makita
@ 2008-03-14  6:43         ` David Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: David Chinner @ 2008-03-14  6:43 UTC (permalink / raw)
  To: Kentaro Makita; +Cc: linux-kernel, dgc

On Fri, Mar 14, 2008 at 02:15:28PM +0900, Kentaro Makita wrote:
> Hi David
> On Thu, 6 Mar 2008 16:54:16 +1100  David Chinner wrote:
> >> No, we need a smarter free list structure. There have been several attempts
> >> at this in the past. Two that I can recall off the top of my head:
> >>
> >> 	- per node unused LRUs
> >> 	- per superblock unusued LRUs
> >> I guess we need to revisit this again, because limiting the size of
> >> the cache like this is not an option.
> I 'm interesting in your patch. I 'll test two patches above if there
> is newer version based on latest kernel.
> 
> >> Try something that relies on leaving the working set on the unused
> >> list, like NFS server benchmarks that have a working set of tens of
> >> million of files....
> >>
> I tested following, and I found no regressions except one case.
>  - kernbench-0.24 on local ext3 and nfs
>  - dbench-3.04 on local ext3 and nfs
>  - IOzone-3.291 on local ext3 and nfs
> -Basic file operations (create/delete/list/copy/move) on local ext3 and nfs

None of those really demonstrate the potential effects of your
proposed change. Even 1 million file sequential create and delete
will not stress it. It won't be until you need to hold that
million dentries in memory to prevent disk lookups while an
application generates significant memory pressure that you will
notice the difference. Without the dentries pinning the inodes,
they'll get reclaimed and need to be fetched from disk again....

FWIW - in trying to understand this a little more, I just checked my
idle test box just after boot and realised something:

$ cat /proc/sys/fs/dentry-state
12723   8709    45      0       0       0
$

That means 12723 allocated dentrys, 8709 unused. That means ~4000 in use.

If the limiting test you are using is:

       if (dentry_stat.nr_dentry > nr_in_use * dentry_unused_ratio / 100)
               prune_dcache(dentry_stat.nr_unused * 5 / 100 , NULL);

We need to have (4000 * 10000) / 100) = 400,000 allocated unused, cached
dentries before they get pruned back. i.e. the working set of dentries I
can currently have is 400,000.

I've got 24GB RAM on this box, and often I want to cache 10,000,000 inodes.
Under this algorithm, I'll need to pin 100,000 dentries to allow the cache to
grow this large or tweak a knob. Therein lies the problem....

Effetively, the dentry_unused_ratio is saying that for every node in
the dentry tree, we allow (dentry_unused_ratio / 100) cached leaves
distributed throughout the tree. At dentry_unused_ratio = 10,000
that gives us 100 leaves per node in the tree.

i.e. if your directory heirachy is deep, then you can cache lots and
lots of inodes because you pin lots of dentries as nodes in the
tree.  But If you have a flat directory structure, there will be
relatively few nodes pinned and you can't cache as many inodes.

IOWs, the size limiting aspect of this algorithm is biased in
exactly the wrong direction. It grows without bound on filesystem
traversal (and hence fails to prevent the condition you want to avoid)
yet prevents caching lots of file dentries if you have a shallow
directory structure (can affect normal application performance).

To prevent the first, you need to tweak the knob in one direction,
and to prevent the second, you need to tweak the knob in the
other direction. We try to avoid adding knobs that require ppl
to tweak them all the time to get optimal performance.

I think we're better off trying to fix the traversal issue....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-03-14  6:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-06  4:41 [PATCH][BUGFIX][RFC] fix soft lock up at NFS mount by making limitation of dentry_unused Kentaro Makita
2008-03-06  5:54 ` David Chinner
2008-03-06  7:15   ` Kentaro Makita
2008-03-08  8:33     ` KOSAKI Motohiro
2008-03-14  5:15       ` Kentaro Makita
2008-03-14  6:43         ` David Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).