LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating
@ 2015-02-18 18:48 Alexander Kuleshov
  2015-02-18 19:25 ` OGAWA Hirofumi
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Kuleshov @ 2015-02-18 18:48 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: linux-kernel, Alexander Kuleshov

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
---
 fs/fat/fat.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 64e295e..1a5080b 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -207,12 +207,12 @@ static inline void fat_save_attrs(struct inode *inode, u8 attrs)
 
 static inline unsigned char fat_checksum(const __u8 *name)
 {
+	u8 i;
 	unsigned char s = name[0];
-	s = (s<<7) + (s>>1) + name[1];	s = (s<<7) + (s>>1) + name[2];
-	s = (s<<7) + (s>>1) + name[3];	s = (s<<7) + (s>>1) + name[4];
-	s = (s<<7) + (s>>1) + name[5];	s = (s<<7) + (s>>1) + name[6];
-	s = (s<<7) + (s>>1) + name[7];	s = (s<<7) + (s>>1) + name[8];
-	s = (s<<7) + (s>>1) + name[9];	s = (s<<7) + (s>>1) + name[10];
+
+	for (i = 1; i < 11; i++)
+		s = (s << 7) + (s >> 1) + name[i];
+
 	return s;
 }
 
-- 
2.3.0.80.g18d0fec


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating
  2015-02-18 18:48 [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating Alexander Kuleshov
@ 2015-02-18 19:25 ` OGAWA Hirofumi
  2015-02-18 19:46   ` Alexander Kuleshov
  0 siblings, 1 reply; 7+ messages in thread
From: OGAWA Hirofumi @ 2015-02-18 19:25 UTC (permalink / raw)
  To: Alexander Kuleshov; +Cc: linux-kernel

Alexander Kuleshov <kuleshovmail@gmail.com> writes:

>  static inline unsigned char fat_checksum(const __u8 *name)
>  {
> +	u8 i;
>  	unsigned char s = name[0];
> -	s = (s<<7) + (s>>1) + name[1];	s = (s<<7) + (s>>1) + name[2];
> -	s = (s<<7) + (s>>1) + name[3];	s = (s<<7) + (s>>1) + name[4];
> -	s = (s<<7) + (s>>1) + name[5];	s = (s<<7) + (s>>1) + name[6];
> -	s = (s<<7) + (s>>1) + name[7];	s = (s<<7) + (s>>1) + name[8];
> -	s = (s<<7) + (s>>1) + name[9];	s = (s<<7) + (s>>1) + name[10];
> +
> +	for (i = 1; i < 11; i++)
> +		s = (s << 7) + (s >> 1) + name[i];
> +
>  	return s;
>  }

When I wrote this, IIRC, there was measurable performance
difference. All major gcc versions are enough smart now to optimize this?
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating
  2015-02-18 19:25 ` OGAWA Hirofumi
@ 2015-02-18 19:46   ` Alexander Kuleshov
  2015-02-18 20:32     ` OGAWA Hirofumi
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Kuleshov @ 2015-02-18 19:46 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: linux-kernel

Hello,

I tested two simple program with this (gcc-4.9.1)

Results are following:

time ./test_with_loop

real    0m0.001s
user    0m0.000s
sys    0m0.001s

And

time ./test_direct_calculation:

real    0m0.002s
user    0m0.000s
sys    0m0.001s

As you can see result almsot the same. Also i compared assembly output
from two these programras and there are two notes:

1. Assembly output of the program with loop is half as much than
program with directly rotation, but ofcourse we can't take it for rule
here. Current version prepares stack and than just does:

    addq    $1, %rax
    movzbl    (%rax), %eax
    addl    %edx, %eax
    movb    %al, -1(%rbp)
    movzbl    -1(%rbp), %eax
    rorb    %al
    movl    %eax, %edx
    movq    -24(%rbp), %rax

11 times, but version with loop does the same but with cmp/jump.

Thank you.


2015-02-19 1:25 GMT+06:00 OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
> Alexander Kuleshov <kuleshovmail@gmail.com> writes:
>
>>  static inline unsigned char fat_checksum(const __u8 *name)
>>  {
>> +     u8 i;
>>       unsigned char s = name[0];
>> -     s = (s<<7) + (s>>1) + name[1];  s = (s<<7) + (s>>1) + name[2];
>> -     s = (s<<7) + (s>>1) + name[3];  s = (s<<7) + (s>>1) + name[4];
>> -     s = (s<<7) + (s>>1) + name[5];  s = (s<<7) + (s>>1) + name[6];
>> -     s = (s<<7) + (s>>1) + name[7];  s = (s<<7) + (s>>1) + name[8];
>> -     s = (s<<7) + (s>>1) + name[9];  s = (s<<7) + (s>>1) + name[10];
>> +
>> +     for (i = 1; i < 11; i++)
>> +             s = (s << 7) + (s >> 1) + name[i];
>> +
>>       return s;
>>  }
>
> When I wrote this, IIRC, there was measurable performance
> difference. All major gcc versions are enough smart now to optimize this?
> --
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating
  2015-02-18 19:46   ` Alexander Kuleshov
@ 2015-02-18 20:32     ` OGAWA Hirofumi
  2015-02-23 20:40       ` Heinrich Schuchardt
  0 siblings, 1 reply; 7+ messages in thread
From: OGAWA Hirofumi @ 2015-02-18 20:32 UTC (permalink / raw)
  To: Alexander Kuleshov; +Cc: linux-kernel

Alexander Kuleshov <kuleshovmail@gmail.com> writes:

> time ./test_with_loop
>
> real    0m0.001s
> user    0m0.000s
> sys    0m0.001s
>
> And
>
> time ./test_direct_calculation:
>
> real    0m0.002s
> user    0m0.000s
> sys    0m0.001s

Hm,

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

typedef unsigned char	__u8;
typedef unsigned char	u8;

#if 1
static inline unsigned char fat_checksum(const __u8 *name)
{
	unsigned char s = name[0];
	s = (s<<7) + (s>>1) + name[1];	s = (s<<7) + (s>>1) + name[2];
	s = (s<<7) + (s>>1) + name[3];	s = (s<<7) + (s>>1) + name[4];
	s = (s<<7) + (s>>1) + name[5];	s = (s<<7) + (s>>1) + name[6];
	s = (s<<7) + (s>>1) + name[7];	s = (s<<7) + (s>>1) + name[8];
	s = (s<<7) + (s>>1) + name[9];	s = (s<<7) + (s>>1) + name[10];
	return s;
}
#else
static inline unsigned char fat_checksum(const __u8 *name)
{
	unsigned char s = name[0];
	u8 i;
	for (i = 1; i < 11; i++)
		s = (s << 7) + (s >> 1) + name[i];
	return s;
}
#endif

static __attribute__ ((noinline)) int test(unsigned char *name)
{
	long i;
	for (i = 0; i < 100000000L; i++)
		name[i % 11] = fat_checksum(name);
	return name[0];
}

int main(int argc, char *argv[])
{
	printf("%u\n", test((unsigned char *)argv[1]));
	return 0;
}


  $ gcc --version
  gcc (Debian 4.9.1-19) 4.9.1
  Copyright (C) 2014 Free Software Foundation, Inc.
  This is free software; see the source for copying conditions.  There is NO
  warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

  $ gcc -O2 -o c.inline c.c
  # change #if 1 => #if 0
  $ gcc -O2 -o c.loop c.c

  $ time ./c.inline aaaaaaaaaaa
  14

  real	0m0.550s
  user	0m0.548s
  sys	0m0.000s
  $ time ./c.loop aaaaaaaaaaa
  14

  real	0m0.901s
  user	0m0.896s
  sys	0m0.004s

This is my environment only? (gcc (Debian 4.9.1-19) 4.9.1)
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating
  2015-02-18 20:32     ` OGAWA Hirofumi
@ 2015-02-23 20:40       ` Heinrich Schuchardt
  2015-02-24  2:42         ` OGAWA Hirofumi
  0 siblings, 1 reply; 7+ messages in thread
From: Heinrich Schuchardt @ 2015-02-23 20:40 UTC (permalink / raw)
  To: OGAWA Hirofumi, Alexander Kuleshov; +Cc: linux-kernel

On 18.02.2015 21:32, OGAWA Hirofumi wrote:
> Alexander Kuleshov <kuleshovmail@gmail.com> writes:
> 
>> time ./test_with_loop
>>
>> real    0m0.001s
>> user    0m0.000s
>> sys    0m0.001s
>>
>> And
>>
>> time ./test_direct_calculation:
>>
>> real    0m0.002s
>> user    0m0.000s
>> sys    0m0.001s
> 
> Hm,
> 
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> 
> typedef unsigned char	__u8;
> typedef unsigned char	u8;
> 
> #if 1
> static inline unsigned char fat_checksum(const __u8 *name)
> {
> 	unsigned char s = name[0];
> 	s = (s<<7) + (s>>1) + name[1];	s = (s<<7) + (s>>1) + name[2];
> 	s = (s<<7) + (s>>1) + name[3];	s = (s<<7) + (s>>1) + name[4];
> 	s = (s<<7) + (s>>1) + name[5];	s = (s<<7) + (s>>1) + name[6];
> 	s = (s<<7) + (s>>1) + name[7];	s = (s<<7) + (s>>1) + name[8];
> 	s = (s<<7) + (s>>1) + name[9];	s = (s<<7) + (s>>1) + name[10];
> 	return s;
> }
> #else
> static inline unsigned char fat_checksum(const __u8 *name)
> {
> 	unsigned char s = name[0];
> 	u8 i;
> 	for (i = 1; i < 11; i++)
> 		s = (s << 7) + (s >> 1) + name[i];
> 	return s;
> }
> #endif
> 
> static __attribute__ ((noinline)) int test(unsigned char *name)

You have to put
__attribute__((optimize("unroll-loops")))
here to unroll the loop inside the function:


static __attribute__ ((noinline))
__attribute__((optimize("unroll-loops")))
int test(unsigned char *name)

> {
> 	long i;
> 	for (i = 0; i < 100000000L; i++)
> 		name[i % 11] = fat_checksum(name);
> 	return name[0];
> }
> 
> int main(int argc, char *argv[])
> {
> 	printf("%u\n", test((unsigned char *)argv[1]));
> 	return 0;
> }
> 
> 
>   $ gcc --version
>   gcc (Debian 4.9.1-19) 4.9.1
>   Copyright (C) 2014 Free Software Foundation, Inc.
>   This is free software; see the source for copying conditions.  There is NO
>   warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
>   $ gcc -O2 -o c.inline c.c
>   # change #if 1 => #if 0
>   $ gcc -O2 -o c.loop c.c
> 
>   $ time ./c.inline aaaaaaaaaaa
>   14
> 
>   real	0m0.550s
>   user	0m0.548s
>   sys	0m0.000s
>   $ time ./c.loop aaaaaaaaaaa
>   14
> 
>   real	0m0.901s
>   user	0m0.896s
>   sys	0m0.004s
> 
> This is my environment only? (gcc (Debian 4.9.1-19) 4.9.1)
> 

$ time ./c.inline aaaaaaaaaaa
14

real    0m0.743s
user    0m0.740s
sys     0m0.000s

Without __attribute__((optimize("unroll-loops"))) :
$ time ./c.loop aaaaaaaaaaa
14

real    0m1.482s
user    0m1.472s
sys     0m0.004s

With __attribute__((optimize("unroll-loops"))) :

$ time ./c.loop aaaaaaaaaaa
14

real    0m0.742s
user    0m0.740s
sys     0m0.000s

Best regards

Heinrich Schuchardt

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating
  2015-02-23 20:40       ` Heinrich Schuchardt
@ 2015-02-24  2:42         ` OGAWA Hirofumi
  2015-02-24 19:22           ` Heinrich Schuchardt
  0 siblings, 1 reply; 7+ messages in thread
From: OGAWA Hirofumi @ 2015-02-24  2:42 UTC (permalink / raw)
  To: Heinrich Schuchardt; +Cc: Alexander Kuleshov, linux-kernel

Heinrich Schuchardt <xypron.glpk@gmx.de> writes:

> You have to put
> __attribute__((optimize("unroll-loops")))
> here to unroll the loop inside the function:
>
>
> static __attribute__ ((noinline))
> __attribute__((optimize("unroll-loops")))
> int test(unsigned char *name)
>
> $ time ./c.inline aaaaaaaaaaa
> 14
>
> real    0m0.743s
> user    0m0.740s
> sys     0m0.000s
>
> Without __attribute__((optimize("unroll-loops"))) :
> $ time ./c.loop aaaaaaaaaaa
> 14
>
> real    0m1.482s
> user    0m1.472s
> sys     0m0.004s
>
> With __attribute__((optimize("unroll-loops"))) :
>
> $ time ./c.loop aaaaaaaaaaa
> 14
>
> real    0m0.742s
> user    0m0.740s
> sys     0m0.000s

This attribute has to be added to the caller, not fat_checksum()?  I.e.,
we has to add it to all callers of fat_checksum()?

Well, this is interesting gcc optimize option though, maybe not worth to
introduce this to kernel only for fatfs.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating
  2015-02-24  2:42         ` OGAWA Hirofumi
@ 2015-02-24 19:22           ` Heinrich Schuchardt
  0 siblings, 0 replies; 7+ messages in thread
From: Heinrich Schuchardt @ 2015-02-24 19:22 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Alexander Kuleshov, linux-kernel

On 24.02.2015 03:42, OGAWA Hirofumi wrote:
> Heinrich Schuchardt <xypron.glpk@gmx.de> writes:
> 
>> You have to put
>> __attribute__((optimize("unroll-loops")))
>> here to unroll the loop inside the function:
>>
>>
>> static __attribute__ ((noinline))
>> __attribute__((optimize("unroll-loops")))
>> int test(unsigned char *name)
>>
>> $ time ./c.inline aaaaaaaaaaa
>> 14
>>
>> real    0m0.743s
>> user    0m0.740s
>> sys     0m0.000s
>>
>> Without __attribute__((optimize("unroll-loops"))) :
>> $ time ./c.loop aaaaaaaaaaa
>> 14
>>
>> real    0m1.482s
>> user    0m1.472s
>> sys     0m0.004s
>>
>> With __attribute__((optimize("unroll-loops"))) :
>>
>> $ time ./c.loop aaaaaaaaaaa
>> 14
>>
>> real    0m0.742s
>> user    0m0.740s
>> sys     0m0.000s
> 
> This attribute has to be added to the caller, not fat_checksum()?  I.e.,
> we has to add it to all callers of fat_checksum()?

The attribute is applied to the declaration of the function in which you
need loop unrolling.
https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gcc/Attribute-Syntax.html

> 
> Well, this is interesting gcc optimize option though, maybe not worth to
> introduce this to kernel only for fatfs.

There is an effort to compile the Linux kernel with Clang. We should
avoid new GCC specific items.

Best regards

Heinrich

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-02-24 19:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-18 18:48 [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating Alexander Kuleshov
2015-02-18 19:25 ` OGAWA Hirofumi
2015-02-18 19:46   ` Alexander Kuleshov
2015-02-18 20:32     ` OGAWA Hirofumi
2015-02-23 20:40       ` Heinrich Schuchardt
2015-02-24  2:42         ` OGAWA Hirofumi
2015-02-24 19:22           ` Heinrich Schuchardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).