LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations.
@ 2015-02-02  3:59 green
  2015-02-02  5:37 ` Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: green @ 2015-02-02  3:59 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: cluster-devel, linux-kernel, Oleg Drokin

From: Oleg Drokin <green@linuxhacker.ru>

leaf_dealloc uses vzalloc as a fallback to kzalloc(GFP_NOFS), so
it clearly does not want any shrinker activity within the fs itself.
convert vzalloc into __vmalloc(GFP_NOFS|__GFP_ZERO) to better achieve
this goal.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 fs/gfs2/dir.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index c5a34f0..6371192 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -1896,7 +1896,8 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
 
 	ht = kzalloc(size, GFP_NOFS | __GFP_NOWARN);
 	if (ht == NULL)
-		ht = vzalloc(size);
+		ht = __vmalloc(size, GFP_NOFS | __GFP_NOWARN | __GFP_ZERO,
+			       PAGE_KERNEL);
 	if (!ht)
 		return -ENOMEM;
 
-- 
2.1.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations.
  2015-02-02  3:59 [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations green
@ 2015-02-02  5:37 ` Dave Chinner
  2015-02-02  6:57   ` Oleg Drokin
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2015-02-02  5:37 UTC (permalink / raw)
  To: green; +Cc: Steven Whitehouse, cluster-devel, linux-kernel

On Sun, Feb 01, 2015 at 10:59:54PM -0500, green@linuxhacker.ru wrote:
> From: Oleg Drokin <green@linuxhacker.ru>
> 
> leaf_dealloc uses vzalloc as a fallback to kzalloc(GFP_NOFS), so
> it clearly does not want any shrinker activity within the fs itself.
> convert vzalloc into __vmalloc(GFP_NOFS|__GFP_ZERO) to better achieve
> this goal.
> 
> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
> ---
>  fs/gfs2/dir.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
> index c5a34f0..6371192 100644
> --- a/fs/gfs2/dir.c
> +++ b/fs/gfs2/dir.c
> @@ -1896,7 +1896,8 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
>  
>  	ht = kzalloc(size, GFP_NOFS | __GFP_NOWARN);
>  	if (ht == NULL)
> -		ht = vzalloc(size);
> +		ht = __vmalloc(size, GFP_NOFS | __GFP_NOWARN | __GFP_ZERO,
> +			       PAGE_KERNEL);

That, in the end, won't help as vmalloc still uses GFP_KERNEL
allocations deep down in the PTE allocation code. See the hacks in
the DM and XFS code to work around this. i.e. go look for callers of
memalloc_noio_save().  It's ugly and grotesque, but we've got no
other way to limit reclaim context because the MM devs won't pass
the vmalloc gfp context down the stack to the PTE allocations....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations.
  2015-02-02  5:37 ` Dave Chinner
@ 2015-02-02  6:57   ` Oleg Drokin
  2015-02-02  8:11     ` Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: Oleg Drokin @ 2015-02-02  6:57 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Steven Whitehouse, cluster-devel, Linux Kernel Mailing List, linux-mm

Hello!

On Feb 2, 2015, at 12:37 AM, Dave Chinner wrote:

> On Sun, Feb 01, 2015 at 10:59:54PM -0500, green@linuxhacker.ru wrote:
>> From: Oleg Drokin <green@linuxhacker.ru>
>> 
>> leaf_dealloc uses vzalloc as a fallback to kzalloc(GFP_NOFS), so
>> it clearly does not want any shrinker activity within the fs itself.
>> convert vzalloc into __vmalloc(GFP_NOFS|__GFP_ZERO) to better achieve
>> this goal.
>> 
>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
>> ---
>> fs/gfs2/dir.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
>> index c5a34f0..6371192 100644
>> --- a/fs/gfs2/dir.c
>> +++ b/fs/gfs2/dir.c
>> @@ -1896,7 +1896,8 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
>> 
>> 	ht = kzalloc(size, GFP_NOFS | __GFP_NOWARN);
>> 	if (ht == NULL)
>> -		ht = vzalloc(size);
>> +		ht = __vmalloc(size, GFP_NOFS | __GFP_NOWARN | __GFP_ZERO,
>> +			       PAGE_KERNEL);
> That, in the end, won't help as vmalloc still uses GFP_KERNEL
> allocations deep down in the PTE allocation code. See the hacks in
> the DM and XFS code to work around this. i.e. go look for callers of
> memalloc_noio_save().  It's ugly and grotesque, but we've got no
> other way to limit reclaim context because the MM devs won't pass
> the vmalloc gfp context down the stack to the PTE allocations....

Hm, interesting.
So all the other code in the kernel that does this sort of thing (and there's quite a bit
outside of xfs and ocfs2) would not get the desired effect?

So, I did some digging in archives and found this thread from 2010 onward with various
patches and rants.
Not sure how I missed that before.

Should we have another run at this I wonder?

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations.
  2015-02-02  6:57   ` Oleg Drokin
@ 2015-02-02  8:11     ` Dave Chinner
  2015-02-02 10:30       ` Steven Whitehouse
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2015-02-02  8:11 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: Steven Whitehouse, cluster-devel, Linux Kernel Mailing List, linux-mm

On Mon, Feb 02, 2015 at 01:57:23AM -0500, Oleg Drokin wrote:
> Hello!
> 
> On Feb 2, 2015, at 12:37 AM, Dave Chinner wrote:
> 
> > On Sun, Feb 01, 2015 at 10:59:54PM -0500, green@linuxhacker.ru wrote:
> >> From: Oleg Drokin <green@linuxhacker.ru>
> >> 
> >> leaf_dealloc uses vzalloc as a fallback to kzalloc(GFP_NOFS), so
> >> it clearly does not want any shrinker activity within the fs itself.
> >> convert vzalloc into __vmalloc(GFP_NOFS|__GFP_ZERO) to better achieve
> >> this goal.
> >> 
> >> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
> >> ---
> >> fs/gfs2/dir.c | 3 ++-
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
> >> index c5a34f0..6371192 100644
> >> --- a/fs/gfs2/dir.c
> >> +++ b/fs/gfs2/dir.c
> >> @@ -1896,7 +1896,8 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
> >> 
> >> 	ht = kzalloc(size, GFP_NOFS | __GFP_NOWARN);
> >> 	if (ht == NULL)
> >> -		ht = vzalloc(size);
> >> +		ht = __vmalloc(size, GFP_NOFS | __GFP_NOWARN | __GFP_ZERO,
> >> +			       PAGE_KERNEL);
> > That, in the end, won't help as vmalloc still uses GFP_KERNEL
> > allocations deep down in the PTE allocation code. See the hacks in
> > the DM and XFS code to work around this. i.e. go look for callers of
> > memalloc_noio_save().  It's ugly and grotesque, but we've got no
> > other way to limit reclaim context because the MM devs won't pass
> > the vmalloc gfp context down the stack to the PTE allocations....
> 
> Hm, interesting.
> So all the other code in the kernel that does this sort of thing (and there's quite a bit
> outside of xfs and ocfs2) would not get the desired effect?

No. I expect, however, that very few people would ever see a
deadlock as a result - it's a pretty rare sort of kernel case to hit
in most cases. XFS does make extensive use of vm_map_ram() in
GFP_NOFS context, however, when large directory block sizes are in
use, and we also have a history of lockdep throwing warnings under
memory pressure. In the end, the memalloc_noio_save() changes were
made to stop the frequent lockdep reports rather than actual
deadlocks.

> So, I did some digging in archives and found this thread from 2010 onward with various
> patches and rants.
> Not sure how I missed that before.
> 
> Should we have another run at this I wonder?

By all means, but I don't think you'll have any more luck than
anyone else in the past. We've still got the problem of attitude
("vmalloc is not for general use") and making it actually work is
seen as "encouraging undesirable behaviour". If you can change
attitudes towards vmalloc first, then you'll be much more likely to
make progress in getting these problems solved....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations.
  2015-02-02  8:11     ` Dave Chinner
@ 2015-02-02 10:30       ` Steven Whitehouse
  2015-02-03 22:33         ` Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Whitehouse @ 2015-02-02 10:30 UTC (permalink / raw)
  To: Dave Chinner, Oleg Drokin
  Cc: cluster-devel, Linux Kernel Mailing List, linux-mm

Hi,

On 02/02/15 08:11, Dave Chinner wrote:
> On Mon, Feb 02, 2015 at 01:57:23AM -0500, Oleg Drokin wrote:
>> Hello!
>>
>> On Feb 2, 2015, at 12:37 AM, Dave Chinner wrote:
>>
>>> On Sun, Feb 01, 2015 at 10:59:54PM -0500, green@linuxhacker.ru wrote:
>>>> From: Oleg Drokin <green@linuxhacker.ru>
>>>>
>>>> leaf_dealloc uses vzalloc as a fallback to kzalloc(GFP_NOFS), so
>>>> it clearly does not want any shrinker activity within the fs itself.
>>>> convert vzalloc into __vmalloc(GFP_NOFS|__GFP_ZERO) to better achieve
>>>> this goal.
>>>>
>>>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
>>>> ---
>>>> fs/gfs2/dir.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
>>>> index c5a34f0..6371192 100644
>>>> --- a/fs/gfs2/dir.c
>>>> +++ b/fs/gfs2/dir.c
>>>> @@ -1896,7 +1896,8 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
>>>>
>>>> 	ht = kzalloc(size, GFP_NOFS | __GFP_NOWARN);
>>>> 	if (ht == NULL)
>>>> -		ht = vzalloc(size);
>>>> +		ht = __vmalloc(size, GFP_NOFS | __GFP_NOWARN | __GFP_ZERO,
>>>> +			       PAGE_KERNEL);
>>> That, in the end, won't help as vmalloc still uses GFP_KERNEL
>>> allocations deep down in the PTE allocation code. See the hacks in
>>> the DM and XFS code to work around this. i.e. go look for callers of
>>> memalloc_noio_save().  It's ugly and grotesque, but we've got no
>>> other way to limit reclaim context because the MM devs won't pass
>>> the vmalloc gfp context down the stack to the PTE allocations....
>> Hm, interesting.
>> So all the other code in the kernel that does this sort of thing (and there's quite a bit
>> outside of xfs and ocfs2) would not get the desired effect?
> No. I expect, however, that very few people would ever see a
> deadlock as a result - it's a pretty rare sort of kernel case to hit
> in most cases. XFS does make extensive use of vm_map_ram() in
> GFP_NOFS context, however, when large directory block sizes are in
> use, and we also have a history of lockdep throwing warnings under
> memory pressure. In the end, the memalloc_noio_save() changes were
> made to stop the frequent lockdep reports rather than actual
> deadlocks.
Indeed, I think the patch is still an improvement however, so I'm happy 
to apply it while a better solution is found.

>> So, I did some digging in archives and found this thread from 2010 onward with various
>> patches and rants.
>> Not sure how I missed that before.
>>
>> Should we have another run at this I wonder?
> By all means, but I don't think you'll have any more luck than
> anyone else in the past. We've still got the problem of attitude
> ("vmalloc is not for general use") and making it actually work is
> seen as "encouraging undesirable behaviour". If you can change
> attitudes towards vmalloc first, then you'll be much more likely to
> make progress in getting these problems solved....
>

Well I don't know whether it has to be vmalloc that provides the 
solution here... if memory fragmentation could be controlled then 
kmalloc of larger contiguous chunks of memory could be done using that, 
which might be a better solution overall. But I do agree that we need to 
try and come to some kind of solution to this problem as it is one of 
those things that has been rumbling on for a long time without a proper 
solution.

I also wonder if vmalloc is still very slow? That was the case some time 
ago when I noticed a problem in directory access times in gfs2, which 
made us change to use kmalloc with a vmalloc fallback in the first place,

Steve.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations.
  2015-02-02 10:30       ` Steven Whitehouse
@ 2015-02-03 22:33         ` Dave Chinner
  2015-02-04  7:13           ` Oleg Drokin
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2015-02-03 22:33 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Oleg Drokin, cluster-devel, Linux Kernel Mailing List, linux-mm

On Mon, Feb 02, 2015 at 10:30:29AM +0000, Steven Whitehouse wrote:
> Hi,
> 
> On 02/02/15 08:11, Dave Chinner wrote:
> >On Mon, Feb 02, 2015 at 01:57:23AM -0500, Oleg Drokin wrote:
> >>Hello!
> >>
> >>On Feb 2, 2015, at 12:37 AM, Dave Chinner wrote:
> >>
> >>>On Sun, Feb 01, 2015 at 10:59:54PM -0500, green@linuxhacker.ru wrote:
> >>>>From: Oleg Drokin <green@linuxhacker.ru>
> >>>>
> >>>>leaf_dealloc uses vzalloc as a fallback to kzalloc(GFP_NOFS), so
> >>>>it clearly does not want any shrinker activity within the fs itself.
> >>>>convert vzalloc into __vmalloc(GFP_NOFS|__GFP_ZERO) to better achieve
> >>>>this goal.
....
> >>>>	ht = kzalloc(size, GFP_NOFS | __GFP_NOWARN);
> >>>>	if (ht == NULL)
> >>>>-		ht = vzalloc(size);
> >>>>+		ht = __vmalloc(size, GFP_NOFS | __GFP_NOWARN | __GFP_ZERO,
> >>>>+			       PAGE_KERNEL);
> >>>That, in the end, won't help as vmalloc still uses GFP_KERNEL
> >>>allocations deep down in the PTE allocation code. See the hacks in
> >>>the DM and XFS code to work around this. i.e. go look for callers of
> >>>memalloc_noio_save().  It's ugly and grotesque, but we've got no
> >>>other way to limit reclaim context because the MM devs won't pass
> >>>the vmalloc gfp context down the stack to the PTE allocations....
....
> >>So, I did some digging in archives and found this thread from
> >>2010 onward with various patches and rants.  Not sure how I
> >>missed that before.
> >>
> >>Should we have another run at this I wonder?
> >
> >By all means, but I don't think you'll have any more luck than
> >anyone else in the past. We've still got the problem of attitude
> >("vmalloc is not for general use") and making it actually work is
> >seen as "encouraging undesirable behaviour". If you can change
> >attitudes towards vmalloc first, then you'll be much more likely to
> >make progress in getting these problems solved....
> 
> Well I don't know whether it has to be vmalloc that provides the
> solution here... if memory fragmentation could be controlled then
> kmalloc of larger contiguous chunks of memory could be done using
> that, which might be a better solution overall.

Which has been said repeatedly for the past 15 years. And after all
this time kmalloc is still horribly unreliable for large contiguous
allocations. Hence we still have need for vmalloc for large
contiguous buffers because we have places where memory allocation
failure is simply not an option.

> But I do agree that
> we need to try and come to some kind of solution to this problem as
> it is one of those things that has been rumbling on for a long time
> without a proper solution.
> 
> I also wonder if vmalloc is still very slow? That was the case some
> time ago when I noticed a problem in directory access times in gfs2,
> which made us change to use kmalloc with a vmalloc fallback in the
> first place,

Another of the "myths" about vmalloc. The speed and scalability of
vmap/vmalloc is a long solved problem - Nick Piggin fixed the worst
of those problems 5-6 years ago - see the rewrite from 2008 that
started with commit db64fe0 ("mm: rewrite vmap layer")....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations.
  2015-02-03 22:33         ` Dave Chinner
@ 2015-02-04  7:13           ` Oleg Drokin
  2015-02-04  9:49             ` Steven Whitehouse
  2015-02-05 11:45             ` Dave Chinner
  0 siblings, 2 replies; 10+ messages in thread
From: Oleg Drokin @ 2015-02-04  7:13 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Steven Whitehouse, cluster-devel, Linux Kernel Mailing List, linux-mm

Hello!

On Feb 3, 2015, at 5:33 PM, Dave Chinner wrote:
>> I also wonder if vmalloc is still very slow? That was the case some
>> time ago when I noticed a problem in directory access times in gfs2,
>> which made us change to use kmalloc with a vmalloc fallback in the
>> first place,
> Another of the "myths" about vmalloc. The speed and scalability of
> vmap/vmalloc is a long solved problem - Nick Piggin fixed the worst
> of those problems 5-6 years ago - see the rewrite from 2008 that
> started with commit db64fe0 ("mm: rewrite vmap layer")....

This actually might be less true than one would hope. At least somewhat
recent studies by LLNL (https://jira.hpdd.intel.com/browse/LU-4008)
show that there's huge contention on vmlist_lock, so if you have vmalloc
intense workloads, you get penalized heavily. Granted, this is rhel6 kernel,
but that is still (albeit heavily modified) 2.6.32, which was released at
the end of 2009, way after 2008.
I see that vmlist_lock is gone now, but e.g. vmap_area_lock that is heavily
used is still in place.

So of course with that in place there's every incentive to not use vmalloc
if at all possible. But if used, one would still hopes it would be at least
safe to do even if somewhat slow.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations.
  2015-02-04  7:13           ` Oleg Drokin
@ 2015-02-04  9:49             ` Steven Whitehouse
  2015-02-05 20:11               ` Dave Chinner
  2015-02-05 11:45             ` Dave Chinner
  1 sibling, 1 reply; 10+ messages in thread
From: Steven Whitehouse @ 2015-02-04  9:49 UTC (permalink / raw)
  To: Oleg Drokin, Dave Chinner
  Cc: cluster-devel, Linux Kernel Mailing List, linux-mm

Hi,

On 04/02/15 07:13, Oleg Drokin wrote:
> Hello!
>
> On Feb 3, 2015, at 5:33 PM, Dave Chinner wrote:
>>> I also wonder if vmalloc is still very slow? That was the case some
>>> time ago when I noticed a problem in directory access times in gfs2,
>>> which made us change to use kmalloc with a vmalloc fallback in the
>>> first place,
>> Another of the "myths" about vmalloc. The speed and scalability of
>> vmap/vmalloc is a long solved problem - Nick Piggin fixed the worst
>> of those problems 5-6 years ago - see the rewrite from 2008 that
>> started with commit db64fe0 ("mm: rewrite vmap layer")....
> This actually might be less true than one would hope. At least somewhat
> recent studies by LLNL (https://jira.hpdd.intel.com/browse/LU-4008)
> show that there's huge contention on vmlist_lock, so if you have vmalloc
> intense workloads, you get penalized heavily. Granted, this is rhel6 kernel,
> but that is still (albeit heavily modified) 2.6.32, which was released at
> the end of 2009, way after 2008.
> I see that vmlist_lock is gone now, but e.g. vmap_area_lock that is heavily
> used is still in place.
>
> So of course with that in place there's every incentive to not use vmalloc
> if at all possible. But if used, one would still hopes it would be at least
> safe to do even if somewhat slow.
>
> Bye,
>      Oleg

I was thinking back to this thread:
https://lkml.org/lkml/2010/4/12/207

More recent than 2008, and although it resulted in a patch that 
apparently fixed the problem, I don't think it was ever applied on the 
basis that it was too risky and kmalloc was the proper solution 
anyway.... I've not tested recently, so it may have been fixed in the 
mean time,

Steve.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations.
  2015-02-04  7:13           ` Oleg Drokin
  2015-02-04  9:49             ` Steven Whitehouse
@ 2015-02-05 11:45             ` Dave Chinner
  1 sibling, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2015-02-05 11:45 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: Steven Whitehouse, cluster-devel, Linux Kernel Mailing List, linux-mm

On Wed, Feb 04, 2015 at 02:13:29AM -0500, Oleg Drokin wrote:
> Hello!
> 
> On Feb 3, 2015, at 5:33 PM, Dave Chinner wrote:
> >> I also wonder if vmalloc is still very slow? That was the case some
> >> time ago when I noticed a problem in directory access times in gfs2,
> >> which made us change to use kmalloc with a vmalloc fallback in the
> >> first place,
> > Another of the "myths" about vmalloc. The speed and scalability of
> > vmap/vmalloc is a long solved problem - Nick Piggin fixed the worst
> > of those problems 5-6 years ago - see the rewrite from 2008 that
> > started with commit db64fe0 ("mm: rewrite vmap layer")....
> 
> This actually might be less true than one would hope. At least somewhat
> recent studies by LLNL (https://jira.hpdd.intel.com/browse/LU-4008)
> show that there's huge contention on vmlist_lock, so if you have vmalloc

vmlist_lock and the list it protected went away in 3.10.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations.
  2015-02-04  9:49             ` Steven Whitehouse
@ 2015-02-05 20:11               ` Dave Chinner
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2015-02-05 20:11 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Oleg Drokin, cluster-devel, Linux Kernel Mailing List, linux-mm

On Wed, Feb 04, 2015 at 09:49:50AM +0000, Steven Whitehouse wrote:
> Hi,
> 
> On 04/02/15 07:13, Oleg Drokin wrote:
> >Hello!
> >
> >On Feb 3, 2015, at 5:33 PM, Dave Chinner wrote:
> >>>I also wonder if vmalloc is still very slow? That was the case some
> >>>time ago when I noticed a problem in directory access times in gfs2,
> >>>which made us change to use kmalloc with a vmalloc fallback in the
> >>>first place,
> >>Another of the "myths" about vmalloc. The speed and scalability of
> >>vmap/vmalloc is a long solved problem - Nick Piggin fixed the worst
> >>of those problems 5-6 years ago - see the rewrite from 2008 that
> >>started with commit db64fe0 ("mm: rewrite vmap layer")....
> >This actually might be less true than one would hope. At least somewhat
> >recent studies by LLNL (https://jira.hpdd.intel.com/browse/LU-4008)
> >show that there's huge contention on vmlist_lock, so if you have vmalloc
> >intense workloads, you get penalized heavily. Granted, this is rhel6 kernel,
> >but that is still (albeit heavily modified) 2.6.32, which was released at
> >the end of 2009, way after 2008.
> >I see that vmlist_lock is gone now, but e.g. vmap_area_lock that is heavily
> >used is still in place.
> >
> >So of course with that in place there's every incentive to not use vmalloc
> >if at all possible. But if used, one would still hopes it would be at least
> >safe to do even if somewhat slow.
> >
> >Bye,
> >     Oleg
> 
> I was thinking back to this thread:
> https://lkml.org/lkml/2010/4/12/207
> 
> More recent than 2008, and although it resulted in a patch that
> apparently fixed the problem, I don't think it was ever applied on
> the basis that it was too risky and kmalloc was the proper solution
> anyway.... I've not tested recently, so it may have been fixed in
> the mean time,

IIUC, the problem was resolved with a different fix back in 2011 - a
lookaside cache that avoids the overhead of searching the entire
list on every vmalloc. 

commit 89699605fe7cfd8611900346f61cb6cbf179b10a
Author: Nick Piggin <npiggin@suse.de>
Date:   Tue Mar 22 16:30:36 2011 -0700

    mm: vmap area cache

    Provide a free area cache for the vmalloc virtual address allocator, based
    on the algorithm used by the user virtual memory allocator.

    This reduces the number of rbtree operations and linear traversals over
    the vmap extents in order to find a free area, by starting off at the last
    point that a free area was found.
....
    After this patch, the search will start from where it left off, giving
    closer to an amortized O(1).

    This is verified to solve regressions reported Steven in GFS2, and Avi in
    KVM.
....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-02-05 20:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-02  3:59 [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations green
2015-02-02  5:37 ` Dave Chinner
2015-02-02  6:57   ` Oleg Drokin
2015-02-02  8:11     ` Dave Chinner
2015-02-02 10:30       ` Steven Whitehouse
2015-02-03 22:33         ` Dave Chinner
2015-02-04  7:13           ` Oleg Drokin
2015-02-04  9:49             ` Steven Whitehouse
2015-02-05 20:11               ` Dave Chinner
2015-02-05 11:45             ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).