LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
       [not found] <CGME20180321044137epcas5p221e7ee4a0b7464eaa00dad8320f0251d@epcas5p2.samsung.com>
@ 2018-03-21  4:40 ` Maninder Singh
       [not found]   ` <CGME20180321044149epcas5p12cba1afa5fc0f493a47fe20c4a7394bd@epcas5p1.samsung.com>
                     ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Maninder Singh @ 2018-03-21  4:40 UTC (permalink / raw)
  To: herbert, davem, minchan, ngupta, sergey.senozhatsky.work,
	keescook, anton, ccross, tony.luck, akpm, colin.king
  Cc: linux-crypto, linux-kernel, pankaj.m, a.sahrawat, v.narang,
	Maninder Singh

(Added cover letter to avoid much text in patch description)

LZ4 specification defines 2 byte offset length for 64 KB data.
But in case of ZRAM we compress data per page and in most of
architecture PAGE_SIZE is 4KB. So we can decide offset length based
on actual offset value. For this we can reserve 1 bit to decide offset
length (1 byte or 2 byte). 2 byte required only if ofsset is greater than 127,
else 1 byte is enough.

With this new implementation new offset value can be at MAX 32 KB.

Thus we can save more memory for compressed data.

results checked with new implementation:-

comression size for same input source
(LZ4_DYN < LZO < LZ4)

LZO
=======
orig_data_size: 78917632
compr_data_size: 15894668
mem_used_total: 17117184

LZ4
========
orig_data_size: 78917632
compr_data_size: 16310717
mem_used_total: 17592320

LZ4_DYN
=======
orig_data_size: 78917632
compr_data_size: 15520506
mem_used_total: 16748544

checked performance with below tool:-
https://github.com/sergey-senozhatsky/zram-perf-test
# ./fio-perf-o-meter.sh /tmp/test-fio-zram-lz4 /tmp/test-fio-zram-lz4_dyn
Processing /tmp/test-fio-zram-lz4
Processing /tmp/test-fio-zram-lz4_dyn
#jobs1
WRITE:          1101.7MB/s       1197.7MB/s
WRITE:          799829KB/s       900838KB/s
READ:           2670.2MB/s       2649.5MB/s
READ:           2027.8MB/s       2039.9MB/s
READ:           603703KB/s       597855KB/s
WRITE:          602943KB/s       597103KB/s
READ:           680438KB/s       707986KB/s
WRITE:          679582KB/s       707095KB/s
#jobs2
WRITE:          1993.2MB/s       2121.2MB/s
WRITE:          1654.1MB/s       1700.2MB/s
READ:           5038.2MB/s       4970.9MB/s
READ:           3930.1MB/s       3908.5MB/s
READ:           1113.2MB/s       1117.4MB/s
WRITE:          1111.8MB/s       1115.2MB/s
READ:           1255.8MB/s       1286.5MB/s
WRITE:          1254.2MB/s       1284.9MB/s
#jobs3
WRITE:          2875.6MB/s       3010.3MB/s
WRITE:          2394.4MB/s       2363.2MB/s
READ:           7384.7MB/s       7314.3MB/s
READ:           5389.5MB/s       5427.6MB/s
READ:           1570.8MB/s       1557.3MB/s
WRITE:          1568.8MB/s       1555.3MB/s
READ:           1848.5MB/s       1854.0MB/s
WRITE:          1846.2MB/s       1851.7MB/s
#jobs4
WRITE:          3720.3MB/s       3077.4MB/s
WRITE:          3027.4MB/s       3072.8MB/s
READ:           9694.7MB/s       9822.6MB/s
READ:           6606.5MB/s       6617.2MB/s
READ:           1941.6MB/s       1966.8MB/s
WRITE:          1939.1MB/s       1964.3MB/s
READ:           2405.3MB/s       2347.5MB/s
WRITE:          2402.3MB/s       2344.5MB/s
#jobs5
WRITE:          3335.6MB/s       3360.7MB/s
WRITE:          2670.2MB/s       2677.9MB/s
READ:           9455.3MB/s       8782.2MB/s
READ:           6534.8MB/s       6501.7MB/s
READ:           1848.9MB/s       1858.3MB/s
WRITE:          1846.6MB/s       1855.1MB/s
READ:           2232.4MB/s       2223.7MB/s
WRITE:          2229.6MB/s       2220.9MB/s
#jobs6
WRITE:          3896.5MB/s       3772.9MB/s
WRITE:          3171.1MB/s       3109.4MB/s
READ:           11060MB/s        11120MB/s
READ:           7375.8MB/s       7384.7MB/s
READ:           2132.5MB/s       2133.1MB/s
WRITE:          2129.8MB/s       2131.3MB/s
READ:           2608.4MB/s       2627.3MB/s
WRITE:          2605.7MB/s       2623.2MB/s
#jobs7
WRITE:          4129.4MB/s       4083.2MB/s
WRITE:          3364.5MB/s       3384.4MB/s
READ:           12088MB/s        11062MB/s
READ:           7868.3MB/s       7851.5MB/s
READ:           2277.8MB/s       2291.6MB/s
WRITE:          2274.9MB/s       2288.7MB/s
READ:           2798.5MB/s       2890.1MB/s
WRITE:          2794.1MB/s       2887.4MB/s
#jobs8
WRITE:          4623.3MB/s       4794.9MB/s
WRITE:          3749.3MB/s       3676.9MB/s
READ:           12337MB/s        14076MB/s
READ:           8320.1MB/s       8229.4MB/s
READ:           2496.9MB/s       2486.3MB/s
WRITE:          2493.8MB/s       2483.2MB/s
READ:           3340.4MB/s       3370.6MB/s
WRITE:          3336.2MB/s       3366.4MB/s
#jobs9
WRITE:          4427.6MB/s       4341.3MB/s
WRITE:          3542.6MB/s       3597.2MB/s
READ:           10094MB/s        9888.5MB/s
READ:           7863.5MB/s       8119.9MB/s
READ:           2357.1MB/s       2382.1MB/s
WRITE:          2354.1MB/s       2379.1MB/s
READ:           2828.8MB/s       2826.2MB/s
WRITE:          2825.3MB/s       2822.7MB/s
#jobs10
WRITE:          4463.9MB/s       4327.7MB/s
WRITE:          3637.7MB/s       3592.4MB/s
READ:           10020MB/s        11118MB/s
READ:           7837.8MB/s       8098.7MB/s
READ:           2459.6MB/s       2406.5MB/s
WRITE:          2456.5MB/s       2403.4MB/s
READ:           2804.2MB/s       2829.8MB/s
WRITE:          2800.7MB/s       2826.2MB/s
jobs1                              perfstat
stalled-cycles-frontend     20,23,52,25,317 (  54.32%)    19,29,10,49,608 (  54.50%)
instructions                44,62,30,88,401 (    1.20)    42,50,67,71,907 (    1.20)
branches                     7,12,44,77,233 ( 738.975)     6,64,52,15,491 ( 725.584)
branch-misses                   2,38,66,520 (   0.33%)        2,04,33,819 (   0.31%)
jobs2                              perfstat
stalled-cycles-frontend     42,82,90,69,149 (  56.63%)    41,58,70,01,387 (  56.01%)
instructions                85,33,18,31,411 (    1.13)    85,32,92,28,973 (    1.15)
branches                    13,35,34,99,713 ( 677.499)    13,34,97,00,453 ( 693.104)
branch-misses                   4,50,17,075 (   0.34%)        4,47,28,378 (   0.34%)
jobs3                              perfstat
stalled-cycles-frontend     66,01,57,23,062 (  57.10%)    65,86,74,97,814 (  57.30%)
instructions              1,28,18,27,80,041 (    1.11)  1,28,04,92,91,306 (    1.11)
branches                    20,06,14,16,000 ( 651.453)    20,02,85,32,864 ( 652.536)
branch-misses                   7,10,66,773 (   0.35%)        7,12,75,728 (   0.36%)
jobs4                              perfstat
stalled-cycles-frontend     91,98,71,83,315 (  58.09%)    93,70,91,50,920 (  58.66%)
instructions              1,70,82,79,66,403 (    1.08)  1,71,18,67,74,366 (    1.07)
branches                    26,73,53,03,398 ( 621.532)    26,80,89,38,054 ( 618.718)
branch-misses                   9,82,07,177 (   0.37%)        9,81,64,098 (   0.37%)
jobs5                              perfstat
stalled-cycles-frontend   1,47,29,71,29,605 (  63.59%)  1,47,91,01,92,835 (  63.86%)
instructions              2,18,90,41,63,988 (    0.95)  2,18,55,73,09,594 (    0.94)
branches                    34,64,46,32,880 ( 553.209)    34,55,08,02,781 ( 551.953)
branch-misses                  14,16,79,279 (   0.41%)       13,84,85,054 (   0.40%)
jobs6                              perfstat
stalled-cycles-frontend   2,02,92,92,98,242 (  66.70%)  2,05,33,49,39,627 (  67.01%)
instructions              2,65,13,90,22,217 (    0.87)  2,64,84,45,49,149 (    0.86)
branches                    42,11,54,07,400 ( 510.085)    42,03,58,57,789 ( 505.746)
branch-misses                  17,71,33,628 (   0.42%)       17,74,31,942 (   0.42%)
jobs7                              perfstat
stalled-cycles-frontend   2,79,22,74,37,283 (  70.23%)  2,80,02,50,89,154 (  70.48%)
instructions              3,11,90,38,02,741 (    0.78)  3,09,20,69,87,835 (    0.78)
branches                    49,71,39,90,321 ( 460.940)    49,10,44,23,983 ( 455.686)
branch-misses                  22,43,84,102 (   0.45%)       21,96,67,440 (   0.45%)
jobs8                              perfstat
stalled-cycles-frontend   3,59,62,09,66,766 (  73.38%)  3,58,04,85,16,351 (  73.37%)
instructions              3,43,83,05,02,841 (    0.70)  3,43,33,76,84,985 (    0.70)
branches                    54,02,15,25,784 ( 406.256)    53,91,13,38,774 ( 407.265)
branch-misses                  25,20,35,507 (   0.47%)       25,05,71,030 (   0.46%)
jobs9                              perfstat
stalled-cycles-frontend   4,15,33,64,48,628 (  73.76%)  4,22,88,52,47,923 (  74.16%)
instructions              3,90,79,09,16,552 (    0.69)  3,91,12,92,41,516 (    0.69)
branches                    61,66,87,76,271 ( 403.896)    61,73,58,17,174 ( 399.363)
branch-misses                  28,46,21,136 (   0.46%)       28,45,74,774 (   0.46%)
jobs10                             perfstat
stalled-cycles-frontend   4,74,43,71,32,846 (  74.30%)  4,66,34,70,59,452 (  73.82%)
instructions              4,35,23,51,39,076 (    0.68)  4,38,48,78,54,987 (    0.69)
branches                    68,72,17,08,212 ( 396.945)    69,48,52,50,280 ( 405.847)
branch-misses                  31,73,62,053 (   0.46%)       32,34,76,102 (   0.47%)
seconds elapsed        11.470858891     10.862984653
seconds elapsed        11.802220972     11.348959061
seconds elapsed        11.847204652     11.850297919
seconds elapsed        12.352068602     12.853222188
seconds elapsed        16.162715423     16.355883496
seconds elapsed        16.605502317     16.855938732
seconds elapsed        18.108333660     18.108347866
seconds elapsed        18.621296174     18.354183020
seconds elapsed        22.366502860     22.357632546
seconds elapsed        24.362417439     24.363003009

Maninder Singh, Vaneet Narang (1):
  lz4: Implement lz4 with dynamic offset (lz4_dyn).

 crypto/lz4.c               |   64 ++++++++++++++++++++++++++++++++-
 drivers/block/zram/zcomp.c |    4 ++
 fs/pstore/platform.c       |    2 +-
 include/linux/lz4.h        |   15 ++++++--
 lib/decompress_unlz4.c     |    2 +-
 lib/lz4/lz4_compress.c     |   84 +++++++++++++++++++++++++++++++++++--------
 lib/lz4/lz4_decompress.c   |   56 ++++++++++++++++++++---------
 lib/lz4/lz4defs.h          |   11 ++++++
 8 files changed, 197 insertions(+), 41 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.
       [not found]   ` <CGME20180321044149epcas5p12cba1afa5fc0f493a47fe20c4a7394bd@epcas5p1.samsung.com>
@ 2018-03-21  4:40     ` Maninder Singh
  2018-03-21  7:49       ` Sergey Senozhatsky
                         ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Maninder Singh @ 2018-03-21  4:40 UTC (permalink / raw)
  To: herbert, davem, minchan, ngupta, sergey.senozhatsky.work,
	keescook, anton, ccross, tony.luck, akpm, colin.king
  Cc: linux-crypto, linux-kernel, pankaj.m, a.sahrawat, v.narang,
	Maninder Singh

LZ4 specification defines 2 byte offset length for 64 KB data.
But in case of ZRAM we compress data per page and in most of
architecture PAGE_SIZE is 4KB. So we can decide offset length based
on actual offset value. For this we can reserve 1 bit to decide offset
length (1 byte or 2 byte). 2 byte required only if ofsset is greater than 127,
else 1 byte is enough.

With this new implementation new offset value can be at MAX 32 KB.

Thus we can save more memory for compressed data.

results checked with new implementation:-
LZO
=======
orig_data_size: 78917632
compr_data_size: 15894668
mem_used_total: 17117184

LZ4
========
orig_data_size: 78917632
compr_data_size: 16310717
mem_used_total: 17592320

LZ4_DYN
=======
orig_data_size: 78917632
compr_data_size: 15520506
mem_used_total: 16748544

Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
---
 crypto/lz4.c               |   64 ++++++++++++++++++++++++++++++++-
 drivers/block/zram/zcomp.c |    4 ++
 fs/pstore/platform.c       |    2 +-
 include/linux/lz4.h        |   15 ++++++--
 lib/decompress_unlz4.c     |    2 +-
 lib/lz4/lz4_compress.c     |   84 +++++++++++++++++++++++++++++++++++--------
 lib/lz4/lz4_decompress.c   |   56 ++++++++++++++++++++---------
 lib/lz4/lz4defs.h          |   11 ++++++
 8 files changed, 197 insertions(+), 41 deletions(-)

diff --git a/crypto/lz4.c b/crypto/lz4.c
index 2ce2660..f1a8a20 100644
--- a/crypto/lz4.c
+++ b/crypto/lz4.c
@@ -67,7 +67,20 @@ static int __lz4_compress_crypto(const u8 *src, unsigned int slen,
 				 u8 *dst, unsigned int *dlen, void *ctx)
 {
 	int out_len = LZ4_compress_default(src, dst,
-		slen, *dlen, ctx);
+		slen, *dlen, ctx, false);
+
+	if (!out_len)
+		return -EINVAL;
+
+	*dlen = out_len;
+	return 0;
+}
+
+static int __lz4_compress_crypto_dynamic(const u8 *src, unsigned int slen,
+				 u8 *dst, unsigned int *dlen, void *ctx)
+{
+	int out_len = LZ4_compress_default(src, dst,
+		slen, *dlen, ctx, true);
 
 	if (!out_len)
 		return -EINVAL;
@@ -91,10 +104,30 @@ static int lz4_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
 	return __lz4_compress_crypto(src, slen, dst, dlen, ctx->lz4_comp_mem);
 }
 
+static int lz4_compress_crypto_dynamic(struct crypto_tfm *tfm, const u8 *src,
+			       unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+	struct lz4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	return __lz4_compress_crypto_dynamic(src, slen, dst, dlen, ctx->lz4_comp_mem);
+}
+
 static int __lz4_decompress_crypto(const u8 *src, unsigned int slen,
 				   u8 *dst, unsigned int *dlen, void *ctx)
 {
-	int out_len = LZ4_decompress_safe(src, dst, slen, *dlen);
+	int out_len = LZ4_decompress_safe(src, dst, slen, *dlen, false);
+
+	if (out_len < 0)
+		return -EINVAL;
+
+	*dlen = out_len;
+	return 0;
+}
+
+static int __lz4_decompress_crypto_dynamic(const u8 *src, unsigned int slen,
+				   u8 *dst, unsigned int *dlen, void *ctx)
+{
+	int out_len = LZ4_decompress_safe(src, dst, slen, *dlen, true);
 
 	if (out_len < 0)
 		return -EINVAL;
@@ -117,6 +150,13 @@ static int lz4_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
 	return __lz4_decompress_crypto(src, slen, dst, dlen, NULL);
 }
 
+static int lz4_decompress_crypto_dynamic(struct crypto_tfm *tfm, const u8 *src,
+				 unsigned int slen, u8 *dst,
+				 unsigned int *dlen)
+{
+	return __lz4_decompress_crypto_dynamic(src, slen, dst, dlen, NULL);
+}
+
 static struct crypto_alg alg_lz4 = {
 	.cra_name		= "lz4",
 	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
@@ -130,6 +170,19 @@ static int lz4_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
 	.coa_decompress		= lz4_decompress_crypto } }
 };
 
+static struct crypto_alg alg_lz4_dyn = {
+	.cra_name		= "lz4_dyn",
+	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
+	.cra_ctxsize		= sizeof(struct lz4_ctx),
+	.cra_module		= THIS_MODULE,
+	.cra_list		= LIST_HEAD_INIT(alg_lz4_dyn.cra_list),
+	.cra_init		= lz4_init,
+	.cra_exit		= lz4_exit,
+	.cra_u			= { .compress = {
+	.coa_compress		= lz4_compress_crypto_dynamic,
+	.coa_decompress		= lz4_decompress_crypto_dynamic } }
+};
+
 static struct scomp_alg scomp = {
 	.alloc_ctx		= lz4_alloc_ctx,
 	.free_ctx		= lz4_free_ctx,
@@ -150,9 +203,16 @@ static int __init lz4_mod_init(void)
 	if (ret)
 		return ret;
 
+	ret = crypto_register_alg(&alg_lz4_dyn);
+	if (ret) {
+		crypto_unregister_alg(&alg_lz4);
+		return ret;
+	}
+
 	ret = crypto_register_scomp(&scomp);
 	if (ret) {
 		crypto_unregister_alg(&alg_lz4);
+		crypto_unregister_alg(&alg_lz4_dyn);
 		return ret;
 	}
 
diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 4ed0a78..5bc5aab 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -17,11 +17,15 @@
 #include <linux/crypto.h>
 
 #include "zcomp.h"
+#define KB	(1 << 10)
 
 static const char * const backends[] = {
 	"lzo",
 #if IS_ENABLED(CONFIG_CRYPTO_LZ4)
 	"lz4",
+#if (PAGE_SIZE < (32 * KB))
+	"lz4_dyn",
+#endif
 #endif
 #if IS_ENABLED(CONFIG_CRYPTO_LZ4HC)
 	"lz4hc",
diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index 6910321..2b03449 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -342,7 +342,7 @@ static int compress_lz4(const void *in, void *out, size_t inlen, size_t outlen)
 {
 	int ret;
 
-	ret = LZ4_compress_default(in, out, inlen, outlen, workspace);
+	ret = LZ4_compress_default(in, out, inlen, outlen, workspace, false);
 	if (!ret) {
 		pr_err("LZ4_compress_default error; compression failed!\n");
 		return -EIO;
diff --git a/include/linux/lz4.h b/include/linux/lz4.h
index 394e3d9..08bb95d 100644
--- a/include/linux/lz4.h
+++ b/include/linux/lz4.h
@@ -181,6 +181,9 @@ static inline int LZ4_compressBound(size_t isize)
  *	which must be already allocated
  * @wrkmem: address of the working memory.
  *	This requires 'workmem' of LZ4_MEM_COMPRESS.
+ * @dynoffset: 1 or 0.
+ *  1 specifies dynamic offset. (1 byte or 2 byte based on offset value),
+ *  0 specifies normal offset.  (2 bytes for each offset value).
  *
  * Compresses 'sourceSize' bytes from buffer 'source'
  * into already allocated 'dest' buffer of size 'maxOutputSize'.
@@ -195,7 +198,7 @@ static inline int LZ4_compressBound(size_t isize)
  *	(necessarily <= maxOutputSize) or 0 if compression fails
  */
 int LZ4_compress_default(const char *source, char *dest, int inputSize,
-	int maxOutputSize, void *wrkmem);
+	int maxOutputSize, void *wrkmem, bool dynOffset);
 
 /**
  * LZ4_compress_fast() - As LZ4_compress_default providing an acceleration param
@@ -207,6 +210,9 @@ int LZ4_compress_default(const char *source, char *dest, int inputSize,
  * @acceleration: acceleration factor
  * @wrkmem: address of the working memory.
  *	This requires 'workmem' of LZ4_MEM_COMPRESS.
+ * @dynoffset: 1 or 0.
+ *  1 specifies dynamic offset. (1 byte or 2 byte based on offset value),
+ *  0 specifies normal offset.  (2 bytes for each offset value).
  *
  * Same as LZ4_compress_default(), but allows to select an "acceleration"
  * factor. The larger the acceleration value, the faster the algorithm,
@@ -219,7 +225,7 @@ int LZ4_compress_default(const char *source, char *dest, int inputSize,
  *	(necessarily <= maxOutputSize) or 0 if compression fails
  */
 int LZ4_compress_fast(const char *source, char *dest, int inputSize,
-	int maxOutputSize, int acceleration, void *wrkmem);
+	int maxOutputSize, int acceleration, void *wrkmem, bool dynOffset);
 
 /**
  * LZ4_compress_destSize() - Compress as much data as possible
@@ -277,6 +283,9 @@ int LZ4_compress_destSize(const char *source, char *dest, int *sourceSizePtr,
  *	which must be already allocated
  * @compressedSize: is the precise full size of the compressed block
  * @maxDecompressedSize: is the size of 'dest' buffer
+ * @dynoffset: 1 or 0.
+ *  1 specifies dynamic offset. (1 byte or 2 byte based on offset value),
+ *  0 specifies normal offset.  (2 bytes for each offset value).
  *
  * Decompresses data fom 'source' into 'dest'.
  * If the source stream is detected malformed, the function will
@@ -290,7 +299,7 @@ int LZ4_compress_destSize(const char *source, char *dest, int *sourceSizePtr,
  *	or a negative result in case of error
  */
 int LZ4_decompress_safe(const char *source, char *dest, int compressedSize,
-	int maxDecompressedSize);
+	int maxDecompressedSize, bool dynOffset);
 
 /**
  * LZ4_decompress_safe_partial() - Decompress a block of size 'compressedSize'
diff --git a/lib/decompress_unlz4.c b/lib/decompress_unlz4.c
index 1b0baf3..8be2faa 100644
--- a/lib/decompress_unlz4.c
+++ b/lib/decompress_unlz4.c
@@ -158,7 +158,7 @@ STATIC inline int INIT unlz4(u8 *input, long in_len,
 #else
 		dest_len = uncomp_chunksize;
 
-		ret = LZ4_decompress_safe(inp, outp, chunksize, dest_len);
+		ret = LZ4_decompress_safe(inp, outp, chunksize, dest_len, false);
 		dest_len = ret;
 #endif
 		if (ret < 0) {
diff --git a/lib/lz4/lz4_compress.c b/lib/lz4/lz4_compress.c
index cc7b6d4..185c358 100644
--- a/lib/lz4/lz4_compress.c
+++ b/lib/lz4/lz4_compress.c
@@ -183,7 +183,8 @@ static FORCE_INLINE int LZ4_compress_generic(
 	const tableType_t tableType,
 	const dict_directive dict,
 	const dictIssue_directive dictIssue,
-	const U32 acceleration)
+	const U32 acceleration,
+	const Dynamic_Offset dynOffset)
 {
 	const BYTE *ip = (const BYTE *) source;
 	const BYTE *base;
@@ -199,6 +200,7 @@ static FORCE_INLINE int LZ4_compress_generic(
 
 	BYTE *op = (BYTE *) dest;
 	BYTE * const olimit = op + maxOutputSize;
+	int max_distance = dynOffset ? MAX_DISTANCE_DYN : MAX_DISTANCE;
 
 	U32 forwardH;
 	size_t refDelta = 0;
@@ -245,6 +247,7 @@ static FORCE_INLINE int LZ4_compress_generic(
 	for ( ; ; ) {
 		const BYTE *match;
 		BYTE *token;
+		int curr_offset;
 
 		/* Find a match */
 		{
@@ -285,7 +288,7 @@ static FORCE_INLINE int LZ4_compress_generic(
 					: 0)
 				|| ((tableType == byU16)
 					? 0
-					: (match + MAX_DISTANCE < ip))
+					: (match + max_distance < ip))
 				|| (LZ4_read32(match + refDelta)
 					!= LZ4_read32(ip)));
 		}
@@ -328,8 +331,26 @@ static FORCE_INLINE int LZ4_compress_generic(
 
 _next_match:
 		/* Encode Offset */
-		LZ4_writeLE16(op, (U16)(ip - match));
-		op += 2;
+		if (dynOffset) {
+			curr_offset = (U16)(ip - match);
+
+			/*
+			 * If Ofsset is greater than 127, we need 2 bytes
+			 * to store it. Otherwise 1 byte is enough.
+			 */
+			if (curr_offset > 127) {
+				curr_offset = (curr_offset << 1) | DYN_BIT;
+				LZ4_writeLE16(op, (U16)curr_offset);
+				op += 2;
+			} else {
+				curr_offset = curr_offset << 1;
+				*op = (BYTE)curr_offset;
+				op++;
+			}
+		} else {
+			LZ4_writeLE16(op, (U16)(ip - match));
+			op += 2;
+		}
 
 		/* Encode MatchLength */
 		{
@@ -480,39 +501,70 @@ static int LZ4_compress_fast_extState(
 			return LZ4_compress_generic(ctx, source,
 				dest, inputSize, 0,
 				noLimit, byU16, noDict,
-				noDictIssue, acceleration);
+				noDictIssue, acceleration, NoDynOffset);
 		else
 			return LZ4_compress_generic(ctx, source,
 				dest, inputSize, 0,
 				noLimit, tableType, noDict,
-				noDictIssue, acceleration);
+				noDictIssue, acceleration, NoDynOffset);
 	} else {
 		if (inputSize < LZ4_64Klimit)
 			return LZ4_compress_generic(ctx, source,
 				dest, inputSize,
 				maxOutputSize, limitedOutput, byU16, noDict,
-				noDictIssue, acceleration);
+				noDictIssue, acceleration, NoDynOffset);
 		else
 			return LZ4_compress_generic(ctx, source,
 				dest, inputSize,
 				maxOutputSize, limitedOutput, tableType, noDict,
-				noDictIssue, acceleration);
+				noDictIssue, acceleration, NoDynOffset);
 	}
 }
 
+static int LZ4_compress_fast_extState_dynamic(
+	void *state,
+	const char *source,
+	char *dest,
+	int inputSize,
+	int maxOutputSize,
+	int acceleration)
+{
+	LZ4_stream_t_internal *ctx = &((LZ4_stream_t *)state)->internal_donotuse;
+
+	LZ4_resetStream((LZ4_stream_t *)state);
+
+	if (acceleration < 1)
+		acceleration = LZ4_ACCELERATION_DEFAULT;
+
+	if (maxOutputSize >= LZ4_COMPRESSBOUND(inputSize))
+		return LZ4_compress_generic(ctx, source,
+			dest, inputSize, 0,
+			noLimit, byU16, noDict,
+			noDictIssue, acceleration, DynOffset);
+	else
+		return LZ4_compress_generic(ctx, source,
+			dest, inputSize,
+			maxOutputSize, limitedOutput, byU16, noDict,
+			noDictIssue, acceleration, DynOffset);
+}
+
 int LZ4_compress_fast(const char *source, char *dest, int inputSize,
-	int maxOutputSize, int acceleration, void *wrkmem)
+	int maxOutputSize, int acceleration, void *wrkmem, bool dynOffset)
 {
-	return LZ4_compress_fast_extState(wrkmem, source, dest, inputSize,
+	if (!dynOffset)
+		return LZ4_compress_fast_extState(wrkmem, source, dest, inputSize,
+			maxOutputSize, acceleration);
+
+	return LZ4_compress_fast_extState_dynamic(wrkmem, source, dest, inputSize,
 		maxOutputSize, acceleration);
 }
 EXPORT_SYMBOL(LZ4_compress_fast);
 
 int LZ4_compress_default(const char *source, char *dest, int inputSize,
-	int maxOutputSize, void *wrkmem)
+	int maxOutputSize, void *wrkmem, bool dynOffset)
 {
 	return LZ4_compress_fast(source, dest, inputSize,
-		maxOutputSize, LZ4_ACCELERATION_DEFAULT, wrkmem);
+		maxOutputSize, LZ4_ACCELERATION_DEFAULT, wrkmem, dynOffset);
 }
 EXPORT_SYMBOL(LZ4_compress_default);
 
@@ -900,12 +952,12 @@ int LZ4_compress_fast_continue(LZ4_stream_t *LZ4_stream, const char *source,
 			result = LZ4_compress_generic(
 				streamPtr, source, dest, inputSize,
 				maxOutputSize, limitedOutput, byU32,
-				withPrefix64k, dictSmall, acceleration);
+				withPrefix64k, dictSmall, acceleration, NoDynOffset);
 		} else {
 			result = LZ4_compress_generic(
 				streamPtr, source, dest, inputSize,
 				maxOutputSize, limitedOutput, byU32,
-				withPrefix64k, noDictIssue, acceleration);
+				withPrefix64k, noDictIssue, acceleration, NoDynOffset);
 		}
 		streamPtr->dictSize += (U32)inputSize;
 		streamPtr->currentOffset += (U32)inputSize;
@@ -921,12 +973,12 @@ int LZ4_compress_fast_continue(LZ4_stream_t *LZ4_stream, const char *source,
 			result = LZ4_compress_generic(
 				streamPtr, source, dest, inputSize,
 				maxOutputSize, limitedOutput, byU32,
-				usingExtDict, dictSmall, acceleration);
+				usingExtDict, dictSmall, acceleration, NoDynOffset);
 		} else {
 			result = LZ4_compress_generic(
 				streamPtr, source, dest, inputSize,
 				maxOutputSize, limitedOutput, byU32,
-				usingExtDict, noDictIssue, acceleration);
+				usingExtDict, noDictIssue, acceleration, NoDynOffset);
 		}
 		streamPtr->dictionary = (const BYTE *)source;
 		streamPtr->dictSize = (U32)inputSize;
diff --git a/lib/lz4/lz4_decompress.c b/lib/lz4/lz4_decompress.c
index 141734d..337a828 100644
--- a/lib/lz4/lz4_decompress.c
+++ b/lib/lz4/lz4_decompress.c
@@ -71,7 +71,9 @@ static FORCE_INLINE int LZ4_decompress_generic(
 	 /* only if dict == usingExtDict */
 	 const BYTE * const dictStart,
 	 /* note : = 0 if noDict */
-	 const size_t dictSize
+	 const size_t dictSize,
+	 /* offset == 1; dynamic offset */
+	 const Dynamic_Offset dynOffset
 	 )
 {
 	/* Local Variables */
@@ -141,8 +143,8 @@ static FORCE_INLINE int LZ4_decompress_generic(
 		/* copy literals */
 		cpy = op + length;
 		if (((endOnInput) && ((cpy > (partialDecoding ? oexit : oend - MFLIMIT))
-			|| (ip + length > iend - (2 + 1 + LASTLITERALS))))
-			|| ((!endOnInput) && (cpy > oend - WILDCOPYLENGTH))) {
+			|| (ip + length > iend - (2 + LASTLITERALS))))
+			|| ((!endOnInput) && (cpy > oend - WILDCOPYLENGTH - 1))) {
 			if (partialDecoding) {
 				if (cpy > oend) {
 					/*
@@ -188,13 +190,31 @@ static FORCE_INLINE int LZ4_decompress_generic(
 			break;
 		}
 
-		LZ4_wildCopy(op, ip, cpy);
+		if (dynOffset && length < 4)
+			LZ4_copy4(op, ip);
+		else
+			LZ4_wildCopy(op, ip, cpy);
+
 		ip += length;
 		op = cpy;
 
 		/* get offset */
-		offset = LZ4_readLE16(ip);
-		ip += 2;
+		if (dynOffset) {
+			/*
+			 * Check if DYN_BIT is set, means 2 Byte Offset,
+			 * else 1 Byte Offset.
+			 */
+			if (*ip & DYN_BIT) {
+				offset = LZ4_readLE16(ip) >> 1;
+				ip += 2;
+			} else {
+				offset = *ip >> 1;
+				ip += 1;
+			}
+		} else {
+			offset = LZ4_readLE16(ip);
+			ip += 2;
+		}
 		match = op - offset;
 
 		if ((checkOffset) && (unlikely(match < lowLimit))) {
@@ -335,11 +355,11 @@ static FORCE_INLINE int LZ4_decompress_generic(
 }
 
 int LZ4_decompress_safe(const char *source, char *dest,
-	int compressedSize, int maxDecompressedSize)
+	int compressedSize, int maxDecompressedSize, bool dynOffset)
 {
 	return LZ4_decompress_generic(source, dest, compressedSize,
 		maxDecompressedSize, endOnInputSize, full, 0,
-		noDict, (BYTE *)dest, NULL, 0);
+		noDict, (BYTE *)dest, NULL, 0, dynOffset);
 }
 
 int LZ4_decompress_safe_partial(const char *source, char *dest,
@@ -347,14 +367,14 @@ int LZ4_decompress_safe_partial(const char *source, char *dest,
 {
 	return LZ4_decompress_generic(source, dest, compressedSize,
 		maxDecompressedSize, endOnInputSize, partial,
-		targetOutputSize, noDict, (BYTE *)dest, NULL, 0);
+		targetOutputSize, noDict, (BYTE *)dest, NULL, 0, NoDynOffset);
 }
 
 int LZ4_decompress_fast(const char *source, char *dest, int originalSize)
 {
 	return LZ4_decompress_generic(source, dest, 0, originalSize,
 		endOnOutputSize, full, 0, withPrefix64k,
-		(BYTE *)(dest - 64 * KB), NULL, 64 * KB);
+		(BYTE *)(dest - 64 * KB), NULL, 64 * KB, NoDynOffset);
 }
 
 int LZ4_setStreamDecode(LZ4_streamDecode_t *LZ4_streamDecode,
@@ -392,7 +412,7 @@ int LZ4_decompress_safe_continue(LZ4_streamDecode_t *LZ4_streamDecode,
 			endOnInputSize, full, 0,
 			usingExtDict, lz4sd->prefixEnd - lz4sd->prefixSize,
 			lz4sd->externalDict,
-			lz4sd->extDictSize);
+			lz4sd->extDictSize, NoDynOffset);
 
 		if (result <= 0)
 			return result;
@@ -406,7 +426,7 @@ int LZ4_decompress_safe_continue(LZ4_streamDecode_t *LZ4_streamDecode,
 			compressedSize, maxOutputSize,
 			endOnInputSize, full, 0,
 			usingExtDict, (BYTE *)dest,
-			lz4sd->externalDict, lz4sd->extDictSize);
+			lz4sd->externalDict, lz4sd->extDictSize, NoDynOffset);
 		if (result <= 0)
 			return result;
 		lz4sd->prefixSize = result;
@@ -427,7 +447,7 @@ int LZ4_decompress_fast_continue(LZ4_streamDecode_t *LZ4_streamDecode,
 			endOnOutputSize, full, 0,
 			usingExtDict,
 			lz4sd->prefixEnd - lz4sd->prefixSize,
-			lz4sd->externalDict, lz4sd->extDictSize);
+			lz4sd->externalDict, lz4sd->extDictSize, NoDynOffset);
 
 		if (result <= 0)
 			return result;
@@ -440,7 +460,7 @@ int LZ4_decompress_fast_continue(LZ4_streamDecode_t *LZ4_streamDecode,
 		result = LZ4_decompress_generic(source, dest, 0, originalSize,
 			endOnOutputSize, full, 0,
 			usingExtDict, (BYTE *)dest,
-			lz4sd->externalDict, lz4sd->extDictSize);
+			lz4sd->externalDict, lz4sd->extDictSize, NoDynOffset);
 		if (result <= 0)
 			return result;
 		lz4sd->prefixSize = originalSize;
@@ -463,19 +483,19 @@ static FORCE_INLINE int LZ4_decompress_usingDict_generic(const char *source,
 	if (dictSize == 0)
 		return LZ4_decompress_generic(source, dest,
 			compressedSize, maxOutputSize, safe, full, 0,
-			noDict, (BYTE *)dest, NULL, 0);
+			noDict, (BYTE *)dest, NULL, 0, NoDynOffset);
 	if (dictStart + dictSize == dest) {
 		if (dictSize >= (int)(64 * KB - 1))
 			return LZ4_decompress_generic(source, dest,
 				compressedSize, maxOutputSize, safe, full, 0,
-				withPrefix64k, (BYTE *)dest - 64 * KB, NULL, 0);
+				withPrefix64k, (BYTE *)dest - 64 * KB, NULL, 0, NoDynOffset);
 		return LZ4_decompress_generic(source, dest, compressedSize,
 			maxOutputSize, safe, full, 0, noDict,
-			(BYTE *)dest - dictSize, NULL, 0);
+			(BYTE *)dest - dictSize, NULL, 0, NoDynOffset);
 	}
 	return LZ4_decompress_generic(source, dest, compressedSize,
 		maxOutputSize, safe, full, 0, usingExtDict,
-		(BYTE *)dest, (const BYTE *)dictStart, dictSize);
+		(BYTE *)dest, (const BYTE *)dictStart, dictSize, NoDynOffset);
 }
 
 int LZ4_decompress_safe_usingDict(const char *source, char *dest,
diff --git a/lib/lz4/lz4defs.h b/lib/lz4/lz4defs.h
index 00a0b58..9451a73 100644
--- a/lib/lz4/lz4defs.h
+++ b/lib/lz4/lz4defs.h
@@ -75,6 +75,7 @@
 #define WILDCOPYLENGTH 8
 #define LASTLITERALS 5
 #define MFLIMIT (WILDCOPYLENGTH + MINMATCH)
+#define DYN_BIT 0x1
 
 /* Increase this value ==> compression run slower on incompressible data */
 #define LZ4_SKIPTRIGGER 6
@@ -87,6 +88,7 @@
 
 #define MAXD_LOG 16
 #define MAX_DISTANCE ((1 << MAXD_LOG) - 1)
+#define MAX_DISTANCE_DYN ((1 << (MAXD_LOG - 1)) - 1)
 #define STEPSIZE sizeof(size_t)
 
 #define ML_BITS	4
@@ -147,6 +149,13 @@ static FORCE_INLINE void LZ4_copy8(void *dst, const void *src)
 #endif
 }
 
+static FORCE_INLINE void LZ4_copy4(void *dst, const void *src)
+{
+	U32 a = get_unaligned((const U32 *)src);
+
+	put_unaligned(a, (U32 *)dst);
+}
+
 /*
  * customized variant of memcpy,
  * which can overwrite up to 7 bytes beyond dstEnd
@@ -224,4 +233,6 @@ static FORCE_INLINE unsigned int LZ4_count(
 typedef enum { endOnOutputSize = 0, endOnInputSize = 1 } endCondition_directive;
 typedef enum { full = 0, partial = 1 } earlyEnd_directive;
 
+typedef enum { NoDynOffset = 0, DynOffset = 1 } Dynamic_Offset;
+
 #endif
-- 
1.7.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
  2018-03-21  4:40 ` [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length Maninder Singh
       [not found]   ` <CGME20180321044149epcas5p12cba1afa5fc0f493a47fe20c4a7394bd@epcas5p1.samsung.com>
@ 2018-03-21  6:41   ` Sergey Senozhatsky
  2018-03-21  8:26   ` Sergey Senozhatsky
       [not found]   ` <CGME20180321044137epcas5p221e7ee4a0b7464eaa00dad8320f0251d@epcms5p3>
  3 siblings, 0 replies; 23+ messages in thread
From: Sergey Senozhatsky @ 2018-03-21  6:41 UTC (permalink / raw)
  To: Maninder Singh
  Cc: herbert, davem, minchan, ngupta, sergey.senozhatsky.work,
	keescook, anton, ccross, tony.luck, akpm, colin.king,
	linux-crypto, linux-kernel, pankaj.m, a.sahrawat, v.narang

On (03/21/18 10:10), Maninder Singh wrote:
> (Added cover letter to avoid much text in patch description)
> 
> LZ4 specification defines 2 byte offset length for 64 KB data.
> But in case of ZRAM we compress data per page and in most of
> architecture PAGE_SIZE is 4KB. So we can decide offset length based
> on actual offset value. For this we can reserve 1 bit to decide offset
> length (1 byte or 2 byte). 2 byte required only if ofsset is greater than 127,
> else 1 byte is enough.

So what happens if I compress the data on a system with no dyn
offset and then send it over the network to a machine which has
dyn offset? Or, say, I have a USB stick with a compression enabled
FS, store files on a dyn offset enabled PC and then mount that USB
stick on a machine with no dyn offset support. And vice versa.

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.
  2018-03-21  4:40     ` [PATCH 1/1] lz4: " Maninder Singh
@ 2018-03-21  7:49       ` Sergey Senozhatsky
  2018-03-21 19:59       ` Nick Terrell
                         ` (5 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Sergey Senozhatsky @ 2018-03-21  7:49 UTC (permalink / raw)
  To: Maninder Singh
  Cc: herbert, davem, minchan, ngupta, sergey.senozhatsky.work,
	keescook, anton, ccross, tony.luck, akpm, colin.king,
	linux-crypto, linux-kernel, pankaj.m, a.sahrawat, v.narang

On (03/21/18 10:10), Maninder Singh wrote:
[..]
> +static struct crypto_alg alg_lz4_dyn = {
> +	.cra_name		= "lz4_dyn",
> +	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
> +	.cra_ctxsize		= sizeof(struct lz4_ctx),
> +	.cra_module		= THIS_MODULE,
> +	.cra_list		= LIST_HEAD_INIT(alg_lz4_dyn.cra_list),
> +	.cra_init		= lz4_init,
> +	.cra_exit		= lz4_exit,
> +	.cra_u			= { .compress = {
> +	.coa_compress		= lz4_compress_crypto_dynamic,
> +	.coa_decompress		= lz4_decompress_crypto_dynamic } }
> +};

[..]

> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index 4ed0a78..5bc5aab 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -17,11 +17,15 @@
>  #include <linux/crypto.h>
>  
>  #include "zcomp.h"
> +#define KB	(1 << 10)
>  
>  static const char * const backends[] = {
>  	"lzo",
>  #if IS_ENABLED(CONFIG_CRYPTO_LZ4)
>  	"lz4",
> +#if (PAGE_SIZE < (32 * KB))
> +	"lz4_dyn",
> +#endif

This is not the list of supported algorithms. It's the list of
recommended algorithms. You can configure zram to use any of
available and known to Crypto API algorithms. Including lz4_dyn
on PAGE_SIZE > 32K systems.

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
  2018-03-21  4:40 ` [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length Maninder Singh
       [not found]   ` <CGME20180321044149epcas5p12cba1afa5fc0f493a47fe20c4a7394bd@epcas5p1.samsung.com>
  2018-03-21  6:41   ` [PATCH 0/1] cover-letter/lz4: " Sergey Senozhatsky
@ 2018-03-21  8:26   ` Sergey Senozhatsky
  2018-03-21 19:56     ` Nick Terrell
                       ` (2 more replies)
       [not found]   ` <CGME20180321044137epcas5p221e7ee4a0b7464eaa00dad8320f0251d@epcms5p3>
  3 siblings, 3 replies; 23+ messages in thread
From: Sergey Senozhatsky @ 2018-03-21  8:26 UTC (permalink / raw)
  To: Nick Terrell, Maninder Singh
  Cc: herbert, davem, minchan, ngupta, sergey.senozhatsky.work,
	keescook, anton, ccross, tony.luck, akpm, colin.king,
	linux-crypto, linux-kernel, pankaj.m, a.sahrawat, v.narang

CC-ing Nick

Nick, can you take a look?

Message-IDs:
lkml.kernel.org/r/1521607242-3968-1-git-send-email-maninder1.s@samsung.com
lkml.kernel.org/r/1521607242-3968-2-git-send-email-maninder1.s@samsung.com

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
  2018-03-21  8:26   ` Sergey Senozhatsky
@ 2018-03-21 19:56     ` Nick Terrell
  2018-03-22  2:43       ` Sergey Senozhatsky
       [not found]     ` <CGME20180321044137epcas5p221e7ee4a0b7464eaa00dad8320f0251d@epcms5p6>
       [not found]     ` <CGME20180321044137epcas5p221e7ee4a0b7464eaa00dad8320f0251d@epcms5p4>
  2 siblings, 1 reply; 23+ messages in thread
From: Nick Terrell @ 2018-03-21 19:56 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Maninder Singh, herbert, davem, minchan, ngupta, Kees Cook,
	anton, ccross, tony.luck, akpm, colin.king, linux-crypto,
	linux-kernel, pankaj.m, a.sahrawat, v.narang, Yann Collet

On (03/21/18 10:10), Maninder Singh wrote:
> LZ4 specification defines 2 byte offset length for 64 KB data.
> But in case of ZRAM we compress data per page and in most of
> architecture PAGE_SIZE is 4KB. So we can decide offset length based
> on actual offset value. For this we can reserve 1 bit to decide offset
> length (1 byte or 2 byte). 2 byte required only if ofsset is greater than 127,
> else 1 byte is enough.
> 
> With this new implementation new offset value can be at MAX 32 KB.
> 
> Thus we can save more memory for compressed data.
> 
> results checked with new implementation:-
> 
> comression size for same input source
> (LZ4_DYN < LZO < LZ4)
> 
> LZO
> =======
> orig_data_size: 78917632
> compr_data_size: 15894668
> mem_used_total: 17117184
> 
> LZ4
> ========
> orig_data_size: 78917632
> compr_data_size: 16310717
> mem_used_total: 17592320
> 
> LZ4_DYN
> =======
> orig_data_size: 78917632
> compr_data_size: 15520506
> mem_used_total: 16748544

This seems like a reasonable extension to the algorithm, and it looks like
LZ4_DYN is about a 5% improvement to compression ratio on your benchmark.
The biggest question I have is if it is worthwhile to maintain a separate
incompatible variant of LZ4 in the kernel without any upstream for a 5%
gain? If we do want to go forward with this, we should perform more
benchmarks.

I commented in the patch, but because the `dynOffset` variable isn't a
compile time static in LZ4_decompress_generic(), I suspect that the patch
causes a regression in decompression speed for both LZ4 and LZ4_DYN. You'll
need to re-run the benchmarks to first show that LZ4 before the patch
performs the same as LZ4 after the patch. Then re-run the LZ4 vs LZ4_DYN
benchmarks.

I would also like to see a benchmark in user-space (with the code), so we
can see the performance of LZ4 before and after the patch, as well as LZ4
vs LZ4_DYN without anything else going on. I expect the extra branches in
the decoding loop to have an impact on speed, and I would like to see how
big the impact is without noise. 

CC-ing Yann Collet, the author of LZ4

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.
  2018-03-21  4:40     ` [PATCH 1/1] lz4: " Maninder Singh
  2018-03-21  7:49       ` Sergey Senozhatsky
@ 2018-03-21 19:59       ` Nick Terrell
       [not found]       ` <CGME20180321195916epcas4p1e89e1cc5fd242ee8c348f92f1d1740b0@epcms5p1>
                         ` (4 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Nick Terrell @ 2018-03-21 19:59 UTC (permalink / raw)
  To: Maninder Singh
  Cc: Yann Collet, Sergey Senozhatsky, linux-crypto, linux-kernel,
	Nick Terrell

On (03/21/18 10:10), Maninder Singh wrote:
> diff --git a/lib/lz4/lz4_compress.c b/lib/lz4/lz4_compress.c
> index cc7b6d4..185c358 100644
> --- a/lib/lz4/lz4_compress.c
> +++ b/lib/lz4/lz4_compress.c
> @@ -183,7 +183,8 @@ static FORCE_INLINE int LZ4_compress_generic(
>  	const tableType_t tableType,
>  	const dict_directive dict,
>  	const dictIssue_directive dictIssue,
> -	const U32 acceleration)
> +	const U32 acceleration,
> +	const Dynamic_Offset dynOffset)
>  {
>  	const BYTE *ip = (const BYTE *) source;
>  	const BYTE *base;
> @@ -199,6 +200,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>
>  	BYTE *op = (BYTE *) dest;
>  	BYTE * const olimit = op + maxOutputSize;
> +	int max_distance = dynOffset ? MAX_DISTANCE_DYN : MAX_DISTANCE;

Lets mark this variable `const`. If the compiler doesn't realize that this
variable and `dynOffset` are compile time constants, I expect the speed to
be impacted.

>
>  	U32 forwardH;
>  	size_t refDelta = 0;
> @@ -245,6 +247,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>  	for ( ; ; ) {
>  		const BYTE *match;
>  		BYTE *token;
> +		int curr_offset;
>
>  		/* Find a match */
>  		{
> @@ -285,7 +288,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>  					: 0)
>  				|| ((tableType == byU16)
>  					? 0
> -					: (match + MAX_DISTANCE < ip))
> +					: (match + max_distance < ip))
>  				|| (LZ4_read32(match + refDelta)
>  					!= LZ4_read32(ip)));
>  		}
> @@ -328,8 +331,26 @@ static FORCE_INLINE int LZ4_compress_generic(
>
>  _next_match:
>  		/* Encode Offset */
> -		LZ4_writeLE16(op, (U16)(ip - match));
> -		op += 2;
> +		if (dynOffset) {
> +			curr_offset = (U16)(ip - match);
> +
> +			/*
> +			 * If Ofsset is greater than 127, we need 2 bytes
> +			 * to store it. Otherwise 1 byte is enough.
> +			 */
> +			if (curr_offset > 127) {
> +				curr_offset = (curr_offset << 1) | DYN_BIT;
> +				LZ4_writeLE16(op, (U16)curr_offset);
> +				op += 2;
> +			} else {
> +				curr_offset = curr_offset << 1;
> +				*op = (BYTE)curr_offset;
> +				op++;
> +			}

The standard way to do variable sized integers is to use the high-bit as
the control bit, not the low-bit. Do you have benchmarks to show that using
the low-bit is faster than using the high-bit? If so, lets comment in the
code, if not lets use the high-bit.

> +		} else {
> +			LZ4_writeLE16(op, (U16)(ip - match));
> +			op += 2;
> +		}
>
>  		/* Encode MatchLength */
>  		{
> @@ -480,39 +501,70 @@ static int LZ4_compress_fast_extState(
>  			return LZ4_compress_generic(ctx, source,
>  				dest, inputSize, 0,
>  				noLimit, byU16, noDict,
> -				noDictIssue, acceleration);
> +				noDictIssue, acceleration, NoDynOffset);
>  		else
>  			return LZ4_compress_generic(ctx, source,
>  				dest, inputSize, 0,
>  				noLimit, tableType, noDict,
> -				noDictIssue, acceleration);
> +				noDictIssue, acceleration, NoDynOffset);
>  	} else {
>  		if (inputSize < LZ4_64Klimit)
>  			return LZ4_compress_generic(ctx, source,
>  				dest, inputSize,
>  				maxOutputSize, limitedOutput, byU16, noDict,
> -				noDictIssue, acceleration);
> +				noDictIssue, acceleration, NoDynOffset);
>  		else
>  			return LZ4_compress_generic(ctx, source,
>  				dest, inputSize,
>  				maxOutputSize, limitedOutput, tableType, noDict,
> -				noDictIssue, acceleration);
> +				noDictIssue, acceleration, NoDynOffset);
>  	}
>  }
>
> +static int LZ4_compress_fast_extState_dynamic(
> +	void *state,
> +	const char *source,
> +	char *dest,
> +	int inputSize,
> +	int maxOutputSize,
> +	int acceleration)
> +{
> +	LZ4_stream_t_internal *ctx = &((LZ4_stream_t *)state)->internal_donotuse;
> +
> +	LZ4_resetStream((LZ4_stream_t *)state);
> +
> +	if (acceleration < 1)
> +		acceleration = LZ4_ACCELERATION_DEFAULT;
> +
> +	if (maxOutputSize >= LZ4_COMPRESSBOUND(inputSize))
> +		return LZ4_compress_generic(ctx, source,
> +			dest, inputSize, 0,
> +			noLimit, byU16, noDict,
> +			noDictIssue, acceleration, DynOffset);
> +	else
> +		return LZ4_compress_generic(ctx, source,
> +			dest, inputSize,
> +			maxOutputSize, limitedOutput, byU16, noDict,
> +			noDictIssue, acceleration, DynOffset);
> +}
> +
>  int LZ4_compress_fast(const char *source, char *dest, int inputSize,
> -	int maxOutputSize, int acceleration, void *wrkmem)
> +	int maxOutputSize, int acceleration, void *wrkmem, bool dynOffset)
>  {
> -	return LZ4_compress_fast_extState(wrkmem, source, dest, inputSize,
> +	if (!dynOffset)
> +		return LZ4_compress_fast_extState(wrkmem, source, dest, inputSize,
> +			maxOutputSize, acceleration);
> +
> +	return LZ4_compress_fast_extState_dynamic(wrkmem, source, dest, inputSize,
>  		maxOutputSize, acceleration);
>  }
>  EXPORT_SYMBOL(LZ4_compress_fast);
>
>  int LZ4_compress_default(const char *source, char *dest, int inputSize,
> -	int maxOutputSize, void *wrkmem)
> +	int maxOutputSize, void *wrkmem, bool dynOffset)
>  {
>  	return LZ4_compress_fast(source, dest, inputSize,
> -		maxOutputSize, LZ4_ACCELERATION_DEFAULT, wrkmem);
> +		maxOutputSize, LZ4_ACCELERATION_DEFAULT, wrkmem, dynOffset);
>  }
>  EXPORT_SYMBOL(LZ4_compress_default);
>
> @@ -900,12 +952,12 @@ int LZ4_compress_fast_continue(LZ4_stream_t *LZ4_stream, const char *source,
>  			result = LZ4_compress_generic(
>  				streamPtr, source, dest, inputSize,
>  				maxOutputSize, limitedOutput, byU32,
> -				withPrefix64k, dictSmall, acceleration);
> +				withPrefix64k, dictSmall, acceleration, NoDynOffset);
>  		} else {
>  			result = LZ4_compress_generic(
>  				streamPtr, source, dest, inputSize,
>  				maxOutputSize, limitedOutput, byU32,
> -				withPrefix64k, noDictIssue, acceleration);
> +				withPrefix64k, noDictIssue, acceleration, NoDynOffset);
>  		}
>  		streamPtr->dictSize += (U32)inputSize;
>  		streamPtr->currentOffset += (U32)inputSize;
> @@ -921,12 +973,12 @@ int LZ4_compress_fast_continue(LZ4_stream_t *LZ4_stream, const char *source,
>  			result = LZ4_compress_generic(
>  				streamPtr, source, dest, inputSize,
>  				maxOutputSize, limitedOutput, byU32,
> -				usingExtDict, dictSmall, acceleration);
> +				usingExtDict, dictSmall, acceleration, NoDynOffset);
>  		} else {
>  			result = LZ4_compress_generic(
>  				streamPtr, source, dest, inputSize,
>  				maxOutputSize, limitedOutput, byU32,
> -				usingExtDict, noDictIssue, acceleration);
> +				usingExtDict, noDictIssue, acceleration, NoDynOffset);
>  		}
>  		streamPtr->dictionary = (const BYTE *)source;
>  		streamPtr->dictSize = (U32)inputSize;
> diff --git a/lib/lz4/lz4_decompress.c b/lib/lz4/lz4_decompress.c
> index 141734d..337a828 100644
> --- a/lib/lz4/lz4_decompress.c
> +++ b/lib/lz4/lz4_decompress.c
> @@ -71,7 +71,9 @@ static FORCE_INLINE int LZ4_decompress_generic(
>  	 /* only if dict == usingExtDict */
>  	 const BYTE * const dictStart,
>  	 /* note : = 0 if noDict */
> -	 const size_t dictSize
> +	 const size_t dictSize,
> +	 /* offset == 1; dynamic offset */
> +	 const Dynamic_Offset dynOffset
>  	 )
>  {
>  	/* Local Variables */
> @@ -141,8 +143,8 @@ static FORCE_INLINE int LZ4_decompress_generic(
>  		/* copy literals */
>  		cpy = op + length;
>  		if (((endOnInput) && ((cpy > (partialDecoding ? oexit : oend - MFLIMIT))
> -			|| (ip + length > iend - (2 + 1 + LASTLITERALS))))
> -			|| ((!endOnInput) && (cpy > oend - WILDCOPYLENGTH))) {
> +			|| (ip + length > iend - (2 + LASTLITERALS))))
> +			|| ((!endOnInput) && (cpy > oend - WILDCOPYLENGTH - 1))) {
>  			if (partialDecoding) {
>  				if (cpy > oend) {
>  					/*
> @@ -188,13 +190,31 @@ static FORCE_INLINE int LZ4_decompress_generic(
>  			break;
>  		}
>
> -		LZ4_wildCopy(op, ip, cpy);
> +		if (dynOffset && length < 4)
> +			LZ4_copy4(op, ip);
> +		else
> +			LZ4_wildCopy(op, ip, cpy);
> +

The LZ4 format enforces that the last 5 bytes are literals so that
LZ4_wildCopy() can be used here. I suspect that having this extra branch
here for `dynOffset` mode hurts decompression speed.

>  		ip += length;
>  		op = cpy;
>
>  		/* get offset */
> -		offset = LZ4_readLE16(ip);
> -		ip += 2;
> +		if (dynOffset) {
> +			/*
> +			 * Check if DYN_BIT is set, means 2 Byte Offset,
> +			 * else 1 Byte Offset.
> +			 */
> +			if (*ip & DYN_BIT) {
> +				offset = LZ4_readLE16(ip) >> 1;
> +				ip += 2;
> +			} else {
> +				offset = *ip >> 1;
> +				ip += 1;

If we use the high-bit as the control bit, this branch simply becomes
`offset = *ip`, though the long offset branch becomes a bit longer.

> +			}
> +		} else {
> +			offset = LZ4_readLE16(ip);
> +			ip += 2;
> +		}
>  		match = op - offset;
>
>  		if ((checkOffset) && (unlikely(match < lowLimit))) {
> @@ -335,11 +355,11 @@ static FORCE_INLINE int LZ4_decompress_generic(
>  }
>
>  int LZ4_decompress_safe(const char *source, char *dest,
> -	int compressedSize, int maxDecompressedSize)
> +	int compressedSize, int maxDecompressedSize, bool dynOffset)
>  {
>  	return LZ4_decompress_generic(source, dest, compressedSize,
>  		maxDecompressedSize, endOnInputSize, full, 0,
> -		noDict, (BYTE *)dest, NULL, 0);
> +		noDict, (BYTE *)dest, NULL, 0, dynOffset);

You'll need to use the same trick that LZ4_compress_fast() uses, by hard
coding `dynOffset`. We want the compiler to generate too version of
LZ4_decompress_generic(), one with `dynOffset` and one with `!dynOffset`.
That way the tight loop won't the branches that check `dynOffset`.

	if (dynOffset)
		return LZ4_decompress_generic(..., true);
	else
		return LZ4_decompress_generic(..., false);

Without this trick, I expect that this patch causes a regression to both
LZ4 and LZ4_DYN decompression speed.

>  }
>
>  int LZ4_decompress_safe_partial(const char *source, char *dest,
> @@ -347,14 +367,14 @@ int LZ4_decompress_safe_partial(const char *source, char *dest,
>  {
>  	return LZ4_decompress_generic(source, dest, compressedSize,
>  		maxDecompressedSize, endOnInputSize, partial,
> -		targetOutputSize, noDict, (BYTE *)dest, NULL, 0);
> +		targetOutputSize, noDict, (BYTE *)dest, NULL, 0, NoDynOffset);
>  }
>
>  int LZ4_decompress_fast(const char *source, char *dest, int originalSize)
>  {
>  	return LZ4_decompress_generic(source, dest, 0, originalSize,
>  		endOnOutputSize, full, 0, withPrefix64k,
> -		(BYTE *)(dest - 64 * KB), NULL, 64 * KB);
> +		(BYTE *)(dest - 64 * KB), NULL, 64 * KB, NoDynOffset);
>  }
>
>  int LZ4_setStreamDecode(LZ4_streamDecode_t *LZ4_streamDecode,
> @@ -392,7 +412,7 @@ int LZ4_decompress_safe_continue(LZ4_streamDecode_t *LZ4_streamDecode,
>  			endOnInputSize, full, 0,
>  			usingExtDict, lz4sd->prefixEnd - lz4sd->prefixSize,
>  			lz4sd->externalDict,
> -			lz4sd->extDictSize);
> +			lz4sd->extDictSize, NoDynOffset);
>
>  		if (result <= 0)
>  			return result;
> @@ -406,7 +426,7 @@ int LZ4_decompress_safe_continue(LZ4_streamDecode_t *LZ4_streamDecode,
>  			compressedSize, maxOutputSize,
>  			endOnInputSize, full, 0,
>  			usingExtDict, (BYTE *)dest,
> -			lz4sd->externalDict, lz4sd->extDictSize);
> +			lz4sd->externalDict, lz4sd->extDictSize, NoDynOffset);
>  		if (result <= 0)
>  			return result;
>  		lz4sd->prefixSize = result;
> @@ -427,7 +447,7 @@ int LZ4_decompress_fast_continue(LZ4_streamDecode_t *LZ4_streamDecode,
>  			endOnOutputSize, full, 0,
>  			usingExtDict,
>  			lz4sd->prefixEnd - lz4sd->prefixSize,
> -			lz4sd->externalDict, lz4sd->extDictSize);
> +			lz4sd->externalDict, lz4sd->extDictSize, NoDynOffset);
>
>  		if (result <= 0)
>  			return result;
> @@ -440,7 +460,7 @@ int LZ4_decompress_fast_continue(LZ4_streamDecode_t *LZ4_streamDecode,
>  		result = LZ4_decompress_generic(source, dest, 0, originalSize,
>  			endOnOutputSize, full, 0,
>  			usingExtDict, (BYTE *)dest,
> -			lz4sd->externalDict, lz4sd->extDictSize);
> +			lz4sd->externalDict, lz4sd->extDictSize, NoDynOffset);
>  		if (result <= 0)
>  			return result;
>  		lz4sd->prefixSize = originalSize;
> @@ -463,19 +483,19 @@ static FORCE_INLINE int LZ4_decompress_usingDict_generic(const char *source,
>  	if (dictSize == 0)
>  		return LZ4_decompress_generic(source, dest,
>  			compressedSize, maxOutputSize, safe, full, 0,
> -			noDict, (BYTE *)dest, NULL, 0);
> +			noDict, (BYTE *)dest, NULL, 0, NoDynOffset);
>  	if (dictStart + dictSize == dest) {
>  		if (dictSize >= (int)(64 * KB - 1))
>  			return LZ4_decompress_generic(source, dest,
>  				compressedSize, maxOutputSize, safe, full, 0,
> -				withPrefix64k, (BYTE *)dest - 64 * KB, NULL, 0);
> +				withPrefix64k, (BYTE *)dest - 64 * KB, NULL, 0, NoDynOffset);
>  		return LZ4_decompress_generic(source, dest, compressedSize,
>  			maxOutputSize, safe, full, 0, noDict,
> -			(BYTE *)dest - dictSize, NULL, 0);
> +			(BYTE *)dest - dictSize, NULL, 0, NoDynOffset);
>  	}
>  	return LZ4_decompress_generic(source, dest, compressedSize,
>  		maxOutputSize, safe, full, 0, usingExtDict,
> -		(BYTE *)dest, (const BYTE *)dictStart, dictSize);
> +		(BYTE *)dest, (const BYTE *)dictStart, dictSize, NoDynOffset);
>  }
>
>  int LZ4_decompress_safe_usingDict(const char *source, char *dest,
> diff --git a/lib/lz4/lz4defs.h b/lib/lz4/lz4defs.h
> index 00a0b58..9451a73 100644
> --- a/lib/lz4/lz4defs.h
> +++ b/lib/lz4/lz4defs.h
> @@ -75,6 +75,7 @@
>  #define WILDCOPYLENGTH 8
>  #define LASTLITERALS 5
>  #define MFLIMIT (WILDCOPYLENGTH + MINMATCH)
> +#define DYN_BIT 0x1
>
>  /* Increase this value ==> compression run slower on incompressible data */
>  #define LZ4_SKIPTRIGGER 6
> @@ -87,6 +88,7 @@
>
>  #define MAXD_LOG 16
>  #define MAX_DISTANCE ((1 << MAXD_LOG) - 1)
> +#define MAX_DISTANCE_DYN ((1 << (MAXD_LOG - 1)) - 1)
>  #define STEPSIZE sizeof(size_t)
>
>  #define ML_BITS	4
> @@ -147,6 +149,13 @@ static FORCE_INLINE void LZ4_copy8(void *dst, const void *src)
>  #endif
>  }
>
> +static FORCE_INLINE void LZ4_copy4(void *dst, const void *src)
> +{
> +	U32 a = get_unaligned((const U32 *)src);
> +
> +	put_unaligned(a, (U32 *)dst);
> +}
> +
>  /*
>   * customized variant of memcpy,
>   * which can overwrite up to 7 bytes beyond dstEnd
> @@ -224,4 +233,6 @@ static FORCE_INLINE unsigned int LZ4_count(
>  typedef enum { endOnOutputSize = 0, endOnInputSize = 1 } endCondition_directive;
>  typedef enum { full = 0, partial = 1 } earlyEnd_directive;
>
> +typedef enum { NoDynOffset = 0, DynOffset = 1 } Dynamic_Offset;
> +
>  #endif
> --
> 1.7.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
  2018-03-21 19:56     ` Nick Terrell
@ 2018-03-22  2:43       ` Sergey Senozhatsky
  0 siblings, 0 replies; 23+ messages in thread
From: Sergey Senozhatsky @ 2018-03-22  2:43 UTC (permalink / raw)
  To: Nick Terrell
  Cc: Sergey Senozhatsky, Maninder Singh, herbert, davem, minchan,
	ngupta, Kees Cook, anton, ccross, tony.luck, akpm, colin.king,
	linux-crypto, linux-kernel, pankaj.m, a.sahrawat, v.narang,
	Yann Collet

On (03/21/18 19:56), Nick Terrell wrote:
[..]
> This seems like a reasonable extension to the algorithm, and it looks like
> LZ4_DYN is about a 5% improvement to compression ratio on your benchmark.
> The biggest question I have is if it is worthwhile to maintain a separate
> incompatible variant of LZ4 in the kernel without any upstream for a 5%
> gain? If we do want to go forward with this, we should perform more
> benchmarks.
> 
> I commented in the patch, but because the `dynOffset` variable isn't a
> compile time static in LZ4_decompress_generic(), I suspect that the patch
> causes a regression in decompression speed for both LZ4 and LZ4_DYN. You'll
> need to re-run the benchmarks to first show that LZ4 before the patch
> performs the same as LZ4 after the patch. Then re-run the LZ4 vs LZ4_DYN
> benchmarks.
> 
> I would also like to see a benchmark in user-space (with the code), so we
> can see the performance of LZ4 before and after the patch, as well as LZ4
> vs LZ4_DYN without anything else going on. I expect the extra branches in
> the decoding loop to have an impact on speed, and I would like to see how
> big the impact is without noise. 

Yes, I've been thinking about this. There are more branches now
("to dyn or not to dyn") in compression/decompression hot path but
we see less instructions and branches in perf output at the end.
And my guess is that we have a lot of noise from zram and zsmalloc.
The data is XXX bytes shorter with dyn enabled, so we use YYY less
moves and ZZZ less branches while we copy the data to zsmalloc and
from zsmalloc, and I may be that's the root cause of "performance
gain" that we see in zram-fio tests. So may be we need to run
benchmarks against lz4, not zram+lz4.

> CC-ing Yann Collet, the author of LZ4

Great, thanks.

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.
       [not found]       ` <CGME20180321195916epcas4p1e89e1cc5fd242ee8c348f92f1d1740b0@epcms5p1>
@ 2018-03-22  4:28         ` Maninder Singh
  0 siblings, 0 replies; 23+ messages in thread
From: Maninder Singh @ 2018-03-22  4:28 UTC (permalink / raw)
  To: Yann Collet, Sergey Senozhatsky
  Cc: linux-crypto, linux-kernel, Nick Terrell, Vaneet Narang,
	PANKAJ MISHRA, AMIT SAHRAWAT

CC: Vaneet Narang.

 
On (03/21/18 10:10), Maninder Singh wrote:
> diff --git a/lib/lz4/lz4_compress.c b/lib/lz4/lz4_compress.c
> index cc7b6d4..185c358 100644
> --- a/lib/lz4/lz4_compress.c
> +++ b/lib/lz4/lz4_compress.c
> @@ -183,7 +183,8 @@ static FORCE_INLINE int LZ4_compress_generic(
>          const tableType_t tableType,
>          const dict_directive dict,
>          const dictIssue_directive dictIssue,
> -        const U32 acceleration)
> +        const U32 acceleration,
> +        const Dynamic_Offset dynOffset)
>  {
>          const BYTE *ip = (const BYTE *) source;
>          const BYTE *base;
> @@ -199,6 +200,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>
>          BYTE *op = (BYTE *) dest;
>          BYTE * const olimit = op + maxOutputSize;
> +        int max_distance = dynOffset ? MAX_DISTANCE_DYN : MAX_DISTANCE;
 
Lets mark this variable `const`. If the compiler doesn't realize that this
variable and `dynOffset` are compile time constants, I expect the speed to
be impacted.
 
>
>          U32 forwardH;
>          size_t refDelta = 0;
> @@ -245,6 +247,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>          for ( ; ; ) {
>                  const BYTE *match;
>                  BYTE *token;
> +                int curr_offset;
>
>                  /* Find a match */
>                  {
> @@ -285,7 +288,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>                                          : 0)
>                                  || ((tableType == byU16)
>                                          ? 0
> -                                        : (match + MAX_DISTANCE < ip))
> +                                        : (match + max_distance < ip))
>                                  || (LZ4_read32(match + refDelta)
>                                          != LZ4_read32(ip)));
>                  }
> @@ -328,8 +331,26 @@ static FORCE_INLINE int LZ4_compress_generic(
>
>  _next_match:
>                  /* Encode Offset */
> -                LZ4_writeLE16(op, (U16)(ip - match));
> -                op += 2;
> +                if (dynOffset) {
> +                        curr_offset = (U16)(ip - match);
> +
> +                        /*
> +                         * If Ofsset is greater than 127, we need 2 bytes
> +                         * to store it. Otherwise 1 byte is enough.
> +                         */
> +                        if (curr_offset > 127) {
> +                                curr_offset = (curr_offset << 1) | DYN_BIT;
> +                                LZ4_writeLE16(op, (U16)curr_offset);
> +                                op += 2;
> +                        } else {
> +                                curr_offset = curr_offset << 1;
> +                                *op = (BYTE)curr_offset;
> +                                op++;
> +                        }
 
The standard way to do variable sized integers is to use the high-bit as
the control bit, not the low-bit. Do you have benchmarks to show that using
the low-bit is faster than using the high-bit? If so, lets comment in the
code, if not lets use the high-bit.
 
> +                } else {
> +                        LZ4_writeLE16(op, (U16)(ip - match));
> +                        op += 2;
> +                }
>
>                  /* Encode MatchLength */
>                  {
> @@ -480,39 +501,70 @@ static int LZ4_compress_fast_extState(
>                          return LZ4_compress_generic(ctx, source,
>                                  dest, inputSize, 0,
>                                  noLimit, byU16, noDict,
> -                                noDictIssue, acceleration);
> +                                noDictIssue, acceleration, NoDynOffset);
>                  else
>                          return LZ4_compress_generic(ctx, source,
>                                  dest, inputSize, 0,
>                                  noLimit, tableType, noDict,
> -                                noDictIssue, acceleration);
> +                                noDictIssue, acceleration, NoDynOffset);
>          } else {
>                  if (inputSize < LZ4_64Klimit)
>                          return LZ4_compress_generic(ctx, source,
>                                  dest, inputSize,
>                                  maxOutputSize, limitedOutput, byU16, noDict,
> -                                noDictIssue, acceleration);
> +                                noDictIssue, acceleration, NoDynOffset);
>                  else
>                          return LZ4_compress_generic(ctx, source,
>                                  dest, inputSize,
>                                  maxOutputSize, limitedOutput, tableType, noDict,
> -                                noDictIssue, acceleration);
> +                                noDictIssue, acceleration, NoDynOffset);
>          }
>  }
>
> +static int LZ4_compress_fast_extState_dynamic(
> +        void *state,
> +        const char *source,
> +        char *dest,
> +        int inputSize,
> +        int maxOutputSize,
> +        int acceleration)
> +{
> +        LZ4_stream_t_internal *ctx = &((LZ4_stream_t *)state)->internal_donotuse;
> +
> +        LZ4_resetStream((LZ4_stream_t *)state);
> +
> +        if (acceleration < 1)
> +                acceleration = LZ4_ACCELERATION_DEFAULT;
> +
> +        if (maxOutputSize >= LZ4_COMPRESSBOUND(inputSize))
> +                return LZ4_compress_generic(ctx, source,
> +                        dest, inputSize, 0,
> +                        noLimit, byU16, noDict,
> +                        noDictIssue, acceleration, DynOffset);
> +        else
> +                return LZ4_compress_generic(ctx, source,
> +                        dest, inputSize,
> +                        maxOutputSize, limitedOutput, byU16, noDict,
> +                        noDictIssue, acceleration, DynOffset);
> +}
> +
>  int LZ4_compress_fast(const char *source, char *dest, int inputSize,
> -        int maxOutputSize, int acceleration, void *wrkmem)
> +        int maxOutputSize, int acceleration, void *wrkmem, bool dynOffset)
>  {
> -        return LZ4_compress_fast_extState(wrkmem, source, dest, inputSize,
> +        if (!dynOffset)
> +                return LZ4_compress_fast_extState(wrkmem, source, dest, inputSize,
> +                        maxOutputSize, acceleration);
> +
> +        return LZ4_compress_fast_extState_dynamic(wrkmem, source, dest, inputSize,
>                  maxOutputSize, acceleration);
>  }
>  EXPORT_SYMBOL(LZ4_compress_fast);
>
>  int LZ4_compress_default(const char *source, char *dest, int inputSize,
> -        int maxOutputSize, void *wrkmem)
> +        int maxOutputSize, void *wrkmem, bool dynOffset)
>  {
>          return LZ4_compress_fast(source, dest, inputSize,
> -                maxOutputSize, LZ4_ACCELERATION_DEFAULT, wrkmem);
> +                maxOutputSize, LZ4_ACCELERATION_DEFAULT, wrkmem, dynOffset);
>  }
>  EXPORT_SYMBOL(LZ4_compress_default);
>
> @@ -900,12 +952,12 @@ int LZ4_compress_fast_continue(LZ4_stream_t *LZ4_stream, const char *source,
>                          result = LZ4_compress_generic(
>                                  streamPtr, source, dest, inputSize,
>                                  maxOutputSize, limitedOutput, byU32,
> -                                withPrefix64k, dictSmall, acceleration);
> +                                withPrefix64k, dictSmall, acceleration, NoDynOffset);
>                  } else {
>                          result = LZ4_compress_generic(
>                                  streamPtr, source, dest, inputSize,
>                                  maxOutputSize, limitedOutput, byU32,
> -                                withPrefix64k, noDictIssue, acceleration);
> +                                withPrefix64k, noDictIssue, acceleration, NoDynOffset);
>                  }
>                  streamPtr->dictSize += (U32)inputSize;
>                  streamPtr->currentOffset += (U32)inputSize;
> @@ -921,12 +973,12 @@ int LZ4_compress_fast_continue(LZ4_stream_t *LZ4_stream, const char *source,
>                          result = LZ4_compress_generic(
>                                  streamPtr, source, dest, inputSize,
>                                  maxOutputSize, limitedOutput, byU32,
> -                                usingExtDict, dictSmall, acceleration);
> +                                usingExtDict, dictSmall, acceleration, NoDynOffset);
>                  } else {
>                          result = LZ4_compress_generic(
>                                  streamPtr, source, dest, inputSize,
>                                  maxOutputSize, limitedOutput, byU32,
> -                                usingExtDict, noDictIssue, acceleration);
> +                                usingExtDict, noDictIssue, acceleration, NoDynOffset);
>                  }
>                  streamPtr->dictionary = (const BYTE *)source;
>                  streamPtr->dictSize = (U32)inputSize;
> diff --git a/lib/lz4/lz4_decompress.c b/lib/lz4/lz4_decompress.c
> index 141734d..337a828 100644
> --- a/lib/lz4/lz4_decompress.c
> +++ b/lib/lz4/lz4_decompress.c
> @@ -71,7 +71,9 @@ static FORCE_INLINE int LZ4_decompress_generic(
>           /* only if dict == usingExtDict */
>           const BYTE * const dictStart,
>           /* note : = 0 if noDict */
> -         const size_t dictSize
> +         const size_t dictSize,
> +         /* offset == 1; dynamic offset */
> +         const Dynamic_Offset dynOffset
>           )
>  {
>          /* Local Variables */
> @@ -141,8 +143,8 @@ static FORCE_INLINE int LZ4_decompress_generic(
>                  /* copy literals */
>                  cpy = op + length;
>                  if (((endOnInput) && ((cpy > (partialDecoding ? oexit : oend - MFLIMIT))
> -                        || (ip + length > iend - (2 + 1 + LASTLITERALS))))
> -                        || ((!endOnInput) && (cpy > oend - WILDCOPYLENGTH))) {
> +                        || (ip + length > iend - (2 + LASTLITERALS))))
> +                        || ((!endOnInput) && (cpy > oend - WILDCOPYLENGTH - 1))) {
>                          if (partialDecoding) {
>                                  if (cpy > oend) {
>                                          /*
> @@ -188,13 +190,31 @@ static FORCE_INLINE int LZ4_decompress_generic(
>                          break;
>                  }
>
> -                LZ4_wildCopy(op, ip, cpy);
> +                if (dynOffset && length < 4)
> +                        LZ4_copy4(op, ip);
> +                else
> +                        LZ4_wildCopy(op, ip, cpy);
> +
 
The LZ4 format enforces that the last 5 bytes are literals so that
LZ4_wildCopy() can be used here. I suspect that having this extra branch
here for `dynOffset` mode hurts decompression speed.
 
>                  ip += length;
>                  op = cpy;
>
>                  /* get offset */
> -                offset = LZ4_readLE16(ip);
> -                ip += 2;
> +                if (dynOffset) {
> +                        /*
> +                         * Check if DYN_BIT is set, means 2 Byte Offset,
> +                         * else 1 Byte Offset.
> +                         */
> +                        if (*ip & DYN_BIT) {
> +                                offset = LZ4_readLE16(ip) >> 1;
> +                                ip += 2;
> +                        } else {
> +                                offset = *ip >> 1;
> +                                ip += 1;
 
If we use the high-bit as the control bit, this branch simply becomes
`offset = *ip`, though the long offset branch becomes a bit longer.
 
> +                        }
> +                } else {
> +                        offset = LZ4_readLE16(ip);
> +                        ip += 2;
> +                }
>                  match = op - offset;
>
>                  if ((checkOffset) && (unlikely(match < lowLimit))) {
> @@ -335,11 +355,11 @@ static FORCE_INLINE int LZ4_decompress_generic(
>  }
>
>  int LZ4_decompress_safe(const char *source, char *dest,
> -        int compressedSize, int maxDecompressedSize)
> +        int compressedSize, int maxDecompressedSize, bool dynOffset)
>  {
>          return LZ4_decompress_generic(source, dest, compressedSize,
>                  maxDecompressedSize, endOnInputSize, full, 0,
> -                noDict, (BYTE *)dest, NULL, 0);
> +                noDict, (BYTE *)dest, NULL, 0, dynOffset);
 
You'll need to use the same trick that LZ4_compress_fast() uses, by hard
coding `dynOffset`. We want the compiler to generate too version of
LZ4_decompress_generic(), one with `dynOffset` and one with `!dynOffset`.
That way the tight loop won't the branches that check `dynOffset`.
 
        if (dynOffset)
                return LZ4_decompress_generic(..., true);
        else
                return LZ4_decompress_generic(..., false);
 
Without this trick, I expect that this patch causes a regression to both
LZ4 and LZ4_DYN decompression speed.
 
>  }
>
>  int LZ4_decompress_safe_partial(const char *source, char *dest,
> @@ -347,14 +367,14 @@ int LZ4_decompress_safe_partial(const char *source, char *dest,
>  {
>          return LZ4_decompress_generic(source, dest, compressedSize,
>                  maxDecompressedSize, endOnInputSize, partial,
> -                targetOutputSize, noDict, (BYTE *)dest, NULL, 0);
> +                targetOutputSize, noDict, (BYTE *)dest, NULL, 0, NoDynOffset);
>  }
>
>  int LZ4_decompress_fast(const char *source, char *dest, int originalSize)
>  {
>          return LZ4_decompress_generic(source, dest, 0, originalSize,
>                  endOnOutputSize, full, 0, withPrefix64k,
> -                (BYTE *)(dest - 64 * KB), NULL, 64 * KB);
> +                (BYTE *)(dest - 64 * KB), NULL, 64 * KB, NoDynOffset);
>  }
>
>  int LZ4_setStreamDecode(LZ4_streamDecode_t *LZ4_streamDecode,
> @@ -392,7 +412,7 @@ int LZ4_decompress_safe_continue(LZ4_streamDecode_t *LZ4_streamDecode,
>                          endOnInputSize, full, 0,
>                          usingExtDict, lz4sd->prefixEnd - lz4sd->prefixSize,
>                          lz4sd->externalDict,
> -                        lz4sd->extDictSize);
> +                        lz4sd->extDictSize, NoDynOffset);
>
>                  if (result <= 0)
>                          return result;
> @@ -406,7 +426,7 @@ int LZ4_decompress_safe_continue(LZ4_streamDecode_t *LZ4_streamDecode,
>                          compressedSize, maxOutputSize,
>                          endOnInputSize, full, 0,
>                          usingExtDict, (BYTE *)dest,
> -                        lz4sd->externalDict, lz4sd->extDictSize);
> +                        lz4sd->externalDict, lz4sd->extDictSize, NoDynOffset);
>                  if (result <= 0)
>                          return result;
>                  lz4sd->prefixSize = result;
> @@ -427,7 +447,7 @@ int LZ4_decompress_fast_continue(LZ4_streamDecode_t *LZ4_streamDecode,
>                          endOnOutputSize, full, 0,
>                          usingExtDict,
>                          lz4sd->prefixEnd - lz4sd->prefixSize,
> -                        lz4sd->externalDict, lz4sd->extDictSize);
> +                        lz4sd->externalDict, lz4sd->extDictSize, NoDynOffset);
>
>                  if (result <= 0)
>                          return result;
> @@ -440,7 +460,7 @@ int LZ4_decompress_fast_continue(LZ4_streamDecode_t *LZ4_streamDecode,
>                  result = LZ4_decompress_generic(source, dest, 0, originalSize,
>                          endOnOutputSize, full, 0,
>                          usingExtDict, (BYTE *)dest,
> -                        lz4sd->externalDict, lz4sd->extDictSize);
> +                        lz4sd->externalDict, lz4sd->extDictSize, NoDynOffset);
>                  if (result <= 0)
>                          return result;
>                  lz4sd->prefixSize = originalSize;
> @@ -463,19 +483,19 @@ static FORCE_INLINE int LZ4_decompress_usingDict_generic(const char *source,
>          if (dictSize == 0)
>                  return LZ4_decompress_generic(source, dest,
>                          compressedSize, maxOutputSize, safe, full, 0,
> -                        noDict, (BYTE *)dest, NULL, 0);
> +                        noDict, (BYTE *)dest, NULL, 0, NoDynOffset);
>          if (dictStart + dictSize == dest) {
>                  if (dictSize >= (int)(64 * KB - 1))
>                          return LZ4_decompress_generic(source, dest,
>                                  compressedSize, maxOutputSize, safe, full, 0,
> -                                withPrefix64k, (BYTE *)dest - 64 * KB, NULL, 0);
> +                                withPrefix64k, (BYTE *)dest - 64 * KB, NULL, 0, NoDynOffset);
>                  return LZ4_decompress_generic(source, dest, compressedSize,
>                          maxOutputSize, safe, full, 0, noDict,
> -                        (BYTE *)dest - dictSize, NULL, 0);
> +                        (BYTE *)dest - dictSize, NULL, 0, NoDynOffset);
>          }
>          return LZ4_decompress_generic(source, dest, compressedSize,
>                  maxOutputSize, safe, full, 0, usingExtDict,
> -                (BYTE *)dest, (const BYTE *)dictStart, dictSize);
> +                (BYTE *)dest, (const BYTE *)dictStart, dictSize, NoDynOffset);
>  }
>
>  int LZ4_decompress_safe_usingDict(const char *source, char *dest,
> diff --git a/lib/lz4/lz4defs.h b/lib/lz4/lz4defs.h
> index 00a0b58..9451a73 100644
> --- a/lib/lz4/lz4defs.h
> +++ b/lib/lz4/lz4defs.h
> @@ -75,6 +75,7 @@
>  #define WILDCOPYLENGTH 8
>  #define LASTLITERALS 5
>  #define MFLIMIT (WILDCOPYLENGTH + MINMATCH)
> +#define DYN_BIT 0x1
>
>  /* Increase this value ==> compression run slower on incompressible data */
>  #define LZ4_SKIPTRIGGER 6
> @@ -87,6 +88,7 @@
>
>  #define MAXD_LOG 16
>  #define MAX_DISTANCE ((1 << MAXD_LOG) - 1)
> +#define MAX_DISTANCE_DYN ((1 << (MAXD_LOG - 1)) - 1)
>  #define STEPSIZE sizeof(size_t)
>
>  #define ML_BITS        4
> @@ -147,6 +149,13 @@ static FORCE_INLINE void LZ4_copy8(void *dst, const void *src)
>  #endif
>  }
>
> +static FORCE_INLINE void LZ4_copy4(void *dst, const void *src)
> +{
> +        U32 a = get_unaligned((const U32 *)src);
> +
> +        put_unaligned(a, (U32 *)dst);
> +}
> +
>  /*
>   * customized variant of memcpy,
>   * which can overwrite up to 7 bytes beyond dstEnd
> @@ -224,4 +233,6 @@ static FORCE_INLINE unsigned int LZ4_count(
>  typedef enum { endOnOutputSize = 0, endOnInputSize = 1 } endCondition_directive;
>  typedef enum { full = 0, partial = 1 } earlyEnd_directive;
>
> +typedef enum { NoDynOffset = 0, DynOffset = 1 } Dynamic_Offset;
> +
>  #endif
> --
> 1.7.1
 
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.
  2018-03-21  4:40     ` [PATCH 1/1] lz4: " Maninder Singh
                         ` (2 preceding siblings ...)
       [not found]       ` <CGME20180321195916epcas4p1e89e1cc5fd242ee8c348f92f1d1740b0@epcms5p1>
@ 2018-03-22 23:09       ` kbuild test robot
  2018-03-22 23:32       ` kbuild test robot
                         ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: kbuild test robot @ 2018-03-22 23:09 UTC (permalink / raw)
  To: Maninder Singh
  Cc: kbuild-all, herbert, davem, minchan, ngupta,
	sergey.senozhatsky.work, keescook, anton, ccross, tony.luck,
	akpm, colin.king, linux-crypto, linux-kernel, pankaj.m,
	a.sahrawat, v.narang, Maninder Singh

[-- Attachment #1: Type: text/plain, Size: 2648 bytes --]

Hi Maninder,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc6]
[cannot apply to next-20180322]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Maninder-Singh/cover-letter-lz4-Implement-lz4-with-dynamic-offset-length/20180323-064137
config: i386-randconfig-x073-201811 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   fs/pstore/platform.c: In function 'decompress_lz4':
>> fs/pstore/platform.c:357:8: error: too few arguments to function 'LZ4_decompress_safe'
     ret = LZ4_decompress_safe(in, out, inlen, outlen);
           ^~~~~~~~~~~~~~~~~~~
   In file included from fs/pstore/platform.c:38:0:
   include/linux/lz4.h:301:5: note: declared here
    int LZ4_decompress_safe(const char *source, char *dest, int compressedSize,
        ^~~~~~~~~~~~~~~~~~~

vim +/LZ4_decompress_safe +357 fs/pstore/platform.c

8cfc8ddc Geliang Tang 2016-02-18  352  
8cfc8ddc Geliang Tang 2016-02-18  353  static int decompress_lz4(void *in, void *out, size_t inlen, size_t outlen)
8cfc8ddc Geliang Tang 2016-02-18  354  {
8cfc8ddc Geliang Tang 2016-02-18  355  	int ret;
8cfc8ddc Geliang Tang 2016-02-18  356  
d21b5ff1 Sven Schmidt 2017-02-24 @357  	ret = LZ4_decompress_safe(in, out, inlen, outlen);
d21b5ff1 Sven Schmidt 2017-02-24  358  	if (ret < 0) {
d21b5ff1 Sven Schmidt 2017-02-24  359  		/*
d21b5ff1 Sven Schmidt 2017-02-24  360  		 * LZ4_decompress_safe will return an error code
d21b5ff1 Sven Schmidt 2017-02-24  361  		 * (< 0) if decompression failed
d21b5ff1 Sven Schmidt 2017-02-24  362  		 */
d21b5ff1 Sven Schmidt 2017-02-24  363  		pr_err("LZ4_decompress_safe error, ret = %d!\n", ret);
8cfc8ddc Geliang Tang 2016-02-18  364  		return -EIO;
8cfc8ddc Geliang Tang 2016-02-18  365  	}
8cfc8ddc Geliang Tang 2016-02-18  366  
d21b5ff1 Sven Schmidt 2017-02-24  367  	return ret;
8cfc8ddc Geliang Tang 2016-02-18  368  }
8cfc8ddc Geliang Tang 2016-02-18  369  

:::::: The code at line 357 was first introduced by commit
:::::: d21b5ff12df45a65bb220c7e8103a5f0f5609377 fs/pstore: fs/squashfs: change usage of LZ4 to work with new LZ4 version

:::::: TO: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33489 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.
  2018-03-21  4:40     ` [PATCH 1/1] lz4: " Maninder Singh
                         ` (3 preceding siblings ...)
  2018-03-22 23:09       ` kbuild test robot
@ 2018-03-22 23:32       ` kbuild test robot
       [not found]       ` <CGME20180321195916epcas4p1e89e1cc5fd242ee8c348f92f1d1740b0@epcms5p8>
       [not found]       ` <CGME20180321044149epcas5p12cba1afa5fc0f493a47fe20c4a7394bd@epcms5p5>
  6 siblings, 0 replies; 23+ messages in thread
From: kbuild test robot @ 2018-03-22 23:32 UTC (permalink / raw)
  To: Maninder Singh
  Cc: kbuild-all, herbert, davem, minchan, ngupta,
	sergey.senozhatsky.work, keescook, anton, ccross, tony.luck,
	akpm, colin.king, linux-crypto, linux-kernel, pankaj.m,
	a.sahrawat, v.narang, Maninder Singh

[-- Attachment #1: Type: text/plain, Size: 4484 bytes --]

Hi Maninder,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc6]
[cannot apply to next-20180322]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Maninder-Singh/cover-letter-lz4-Implement-lz4-with-dynamic-offset-length/20180323-064137
config: i386-randconfig-s1-03221113 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   fs/squashfs/lz4_wrapper.c: In function 'lz4_uncompress':
>> fs/squashfs/lz4_wrapper.c:110:8: error: too few arguments to function 'LZ4_decompress_safe'
     res = LZ4_decompress_safe(stream->input, stream->output,
           ^~~~~~~~~~~~~~~~~~~
   In file included from fs/squashfs/lz4_wrapper.c:13:0:
   include/linux/lz4.h:301:5: note: declared here
    int LZ4_decompress_safe(const char *source, char *dest, int compressedSize,
        ^~~~~~~~~~~~~~~~~~~

vim +/LZ4_decompress_safe +110 fs/squashfs/lz4_wrapper.c

9c06a46f Phillip Lougher    2014-11-27   91  
9c06a46f Phillip Lougher    2014-11-27   92  
9c06a46f Phillip Lougher    2014-11-27   93  static int lz4_uncompress(struct squashfs_sb_info *msblk, void *strm,
9c06a46f Phillip Lougher    2014-11-27   94  	struct buffer_head **bh, int b, int offset, int length,
9c06a46f Phillip Lougher    2014-11-27   95  	struct squashfs_page_actor *output)
9c06a46f Phillip Lougher    2014-11-27   96  {
9c06a46f Phillip Lougher    2014-11-27   97  	struct squashfs_lz4 *stream = strm;
9c06a46f Phillip Lougher    2014-11-27   98  	void *buff = stream->input, *data;
9c06a46f Phillip Lougher    2014-11-27   99  	int avail, i, bytes = length, res;
9c06a46f Phillip Lougher    2014-11-27  100  
9c06a46f Phillip Lougher    2014-11-27  101  	for (i = 0; i < b; i++) {
9c06a46f Phillip Lougher    2014-11-27  102  		avail = min(bytes, msblk->devblksize - offset);
9c06a46f Phillip Lougher    2014-11-27  103  		memcpy(buff, bh[i]->b_data + offset, avail);
9c06a46f Phillip Lougher    2014-11-27  104  		buff += avail;
9c06a46f Phillip Lougher    2014-11-27  105  		bytes -= avail;
9c06a46f Phillip Lougher    2014-11-27  106  		offset = 0;
9c06a46f Phillip Lougher    2014-11-27  107  		put_bh(bh[i]);
9c06a46f Phillip Lougher    2014-11-27  108  	}
9c06a46f Phillip Lougher    2014-11-27  109  
d21b5ff1 Sven Schmidt       2017-02-24 @110  	res = LZ4_decompress_safe(stream->input, stream->output,
d21b5ff1 Sven Schmidt       2017-02-24  111  		length, output->length);
d21b5ff1 Sven Schmidt       2017-02-24  112  
d21b5ff1 Sven Schmidt       2017-02-24  113  	if (res < 0)
9c06a46f Phillip Lougher    2014-11-27  114  		return -EIO;
9c06a46f Phillip Lougher    2014-11-27  115  
d21b5ff1 Sven Schmidt       2017-02-24  116  	bytes = res;
9c06a46f Phillip Lougher    2014-11-27  117  	data = squashfs_first_page(output);
9c06a46f Phillip Lougher    2014-11-27  118  	buff = stream->output;
9c06a46f Phillip Lougher    2014-11-27  119  	while (data) {
09cbfeaf Kirill A. Shutemov 2016-04-01  120  		if (bytes <= PAGE_SIZE) {
9c06a46f Phillip Lougher    2014-11-27  121  			memcpy(data, buff, bytes);
9c06a46f Phillip Lougher    2014-11-27  122  			break;
9c06a46f Phillip Lougher    2014-11-27  123  		}
09cbfeaf Kirill A. Shutemov 2016-04-01  124  		memcpy(data, buff, PAGE_SIZE);
09cbfeaf Kirill A. Shutemov 2016-04-01  125  		buff += PAGE_SIZE;
09cbfeaf Kirill A. Shutemov 2016-04-01  126  		bytes -= PAGE_SIZE;
9c06a46f Phillip Lougher    2014-11-27  127  		data = squashfs_next_page(output);
9c06a46f Phillip Lougher    2014-11-27  128  	}
9c06a46f Phillip Lougher    2014-11-27  129  	squashfs_finish_page(output);
9c06a46f Phillip Lougher    2014-11-27  130  
d21b5ff1 Sven Schmidt       2017-02-24  131  	return res;
9c06a46f Phillip Lougher    2014-11-27  132  }
9c06a46f Phillip Lougher    2014-11-27  133  

:::::: The code at line 110 was first introduced by commit
:::::: d21b5ff12df45a65bb220c7e8103a5f0f5609377 fs/pstore: fs/squashfs: change usage of LZ4 to work with new LZ4 version

:::::: TO: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28546 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.
       [not found]       ` <CGME20180321195916epcas4p1e89e1cc5fd242ee8c348f92f1d1740b0@epcms5p8>
@ 2018-03-23 13:21         ` Vaneet Narang
  0 siblings, 0 replies; 23+ messages in thread
From: Vaneet Narang @ 2018-03-23 13:21 UTC (permalink / raw)
  To: Maninder Singh, Yann Collet, Sergey Senozhatsky
  Cc: linux-crypto, linux-kernel, Nick Terrell, PANKAJ MISHRA, AMIT SAHRAWAT


Hi Nick,

Thanks for your comments, Please check my reply to few of your comments.
I will be sharing benchmarking figures separately. 

> 
>> +                        if (curr_offset > 127) {
>> +                                curr_offset = (curr_offset << 1) | DYN_BIT;
>> +                                LZ4_writeLE16(op, (U16)curr_offset);
>> +                                op += 2;
>> +                        } else {
>> +                                curr_offset = curr_offset << 1;
>> +                                *op = (BYTE)curr_offset;
>> +                                op++;
>> +                        }
> 
>The standard way to do variable sized integers is to use the high-bit as
>the control bit, not the low-bit. Do you have benchmarks to show that using
>the low-bit is faster than using the high-bit? If so, lets comment in the
>code, if not lets use the high-bit.
> 
We are not sure about performance difference of using low or high bit but Since as per 
LZ4 specification, offset needs to be stored in little endian format so keeping Low bit 
as control bit makes it easier to retreive offset when offset is spread across two bytes.

offset = LZ4_readLE16(ip);
if (offset & 0x1) {
	offset = offset >> 1; // Just one Right shift
	ip += 2;
} else {
	offset = (offset & 0xff) >> 1; // Only Two Operation for one byte offset
	ip += 1;
}

So not sure if keeping High bit will make much difference as we will be needing 
same or more operations to get offset in case of High bit.

>>                  /* copy literals */
>>                  cpy = op + length;
>>                  if (((endOnInput) && ((cpy > (partialDecoding ? oexit : oend - MFLIMIT))
>> -                        || (ip + length > iend - (2 + 1 + LASTLITERALS))))
>> -                        || ((!endOnInput) && (cpy > oend - WILDCOPYLENGTH))) {
>> +                        || (ip + length > iend - (2 + LASTLITERALS))))
>> +                        || ((!endOnInput) && (cpy > oend - WILDCOPYLENGTH - 1))) {
>>                          if (partialDecoding) {
>>                                  if (cpy > oend) {
>>                                          /*
>> @@ -188,13 +190,31 @@ static FORCE_INLINE int LZ4_decompress_generic(
>>                          break;
>>                  }
>>
>> -                LZ4_wildCopy(op, ip, cpy);
>> +                if (dynOffset && length < 4)
>> +                        LZ4_copy4(op, ip);
>> +                else
>> +                        LZ4_wildCopy(op, ip, cpy);
>> +
> 
>The LZ4 format enforces that the last 5 bytes are literals so that
>LZ4_wildCopy() can be used here. I suspect that having this extra branch
>here for `dynOffset` mode hurts decompression speed.
> 
This check is made to handle one byte read overflow when decompressing last frame. wildCopy does 8 byte copy blindly.

Issue Case:
0 length literal followed by 1 byte offset and then it ends with one byte token and 5 Byte Last Literals.

With this combination only 7 bytes (1 Byte Offset + 1 Byte Token + 5 Byte Literal) remains at the end of 
input buffer so reading 8 bytes from input buffer results in 1 byte overflow.

Since 1 byte offset is not there in original LZ4, so this issue is not possible. To avoid overhead of this check, 
we planned to use 6 Byte as Last literal Length.

-#define LASTLITERALS    5
+#define LASTLITERALS    6

>>
>>  int LZ4_decompress_safe(const char *source, char *dest,
>> -        int compressedSize, int maxDecompressedSize)
>> +        int compressedSize, int maxDecompressedSize, bool dynOffset)
>>  {
>>          return LZ4_decompress_generic(source, dest, compressedSize,
>>                  maxDecompressedSize, endOnInputSize, full, 0,
>> -                noDict, (BYTE *)dest, NULL, 0);
>> +                noDict, (BYTE *)dest, NULL, 0, dynOffset);
> 
>You'll need to use the same trick that LZ4_compress_fast() uses, by hard
>coding `dynOffset`. We want the compiler to generate too version of
>LZ4_decompress_generic(), one with `dynOffset` and one with `!dynOffset`.
>That way the tight loop won't the branches that check `dynOffset`.
> 
>        if (dynOffset)
>                return LZ4_decompress_generic(..., true);
>        else
>                return LZ4_decompress_generic(..., false);
> 
>Without this trick, I expect that this patch causes a regression to both
>LZ4 and LZ4_DYN decompression speed.

Since there is no backward compatibility of our solution with original LZ4 so we
are bit confused if we should make it as separate API to avoid overhead of dynOffset
checks every where in the code and also to avoid changing prototype of exported functions 
LZ4_decompress/LZ4_compress OR we should keep these checks to avoid redundant code. 
Kindly suggest

Thanks & Regards,
Vaneet Narang

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
       [not found]     ` <CGME20180321044137epcas5p221e7ee4a0b7464eaa00dad8320f0251d@epcms5p6>
@ 2018-03-23 13:43       ` Vaneet Narang
  2018-03-29 10:26       ` Maninder Singh
  1 sibling, 0 replies; 23+ messages in thread
From: Vaneet Narang @ 2018-03-23 13:43 UTC (permalink / raw)
  To: Nick Terrell, Sergey Senozhatsky
  Cc: Maninder Singh, herbert, davem, minchan, ngupta, Kees Cook,
	anton, ccross, tony.luck, akpm, colin.king, linux-crypto,
	linux-kernel, PANKAJ MISHRA, AMIT SAHRAWAT, Yann Collet

Hi Nick / Sergey,


We have compared LZ4 Dyn with Original LZ4 using some samples of realtime application data(4Kb)
compressed/decompressed by ZRAM. For comparison we have used lzbench (https://github.com/inikep/lzbench)
we have implemented dedicated LZ4 Dyn API & kept last literal length as 6 to avoid overhead 
of checks. It seems in average case there is a saving of 3~4% in compression ratio with almost same compression
speed and minor loss in decompression speed (~50MB/s) when compared with LZ4.

Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default compressor of ZRAM.

Original LZ4:
sh-3.2# ./lzbench  -r  -elz4  data/
lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
Compressor name         Compress. Decompress. Compr. size  Ratio Filename
memcpy                   2205 MB/s  2217 MB/s        4096 100.00 data//data_1
lz4 1.8.0                 216 MB/s   761 MB/s        2433  59.40 data//data_1
lz4 1.8.0                 269 MB/s   877 MB/s        1873  45.73 data//data_2
lz4 1.8.0                 238 MB/s   575 MB/s        2060  50.29 data//data_3
lz4 1.8.0                 321 MB/s  1015 MB/s        1464  35.74 data//data_4
lz4 1.8.0                 464 MB/s  1090 MB/s         713  17.41 data//data_5
lz4 1.8.0                 296 MB/s   956 MB/s        1597  38.99 data//data_6
lz4 1.8.0                 338 MB/s   994 MB/s        2238  54.64 data//data_7
lz4 1.8.0                 705 MB/s  1172 MB/s         193   4.71 data//data_8
lz4 1.8.0                 404 MB/s  1150 MB/s        1097  26.78 data//data_9
lz4 1.8.0                 216 MB/s   921 MB/s        3183  77.71 data//data_10
lz4 1.8.0                 456 MB/s  1101 MB/s        1011  24.68 data//data_11
lz4 1.8.0                 867 MB/s  1202 MB/s          37   0.90 data//data_12


LZ4 Dynamic Offet:  
sh-3.2# ./lzbench  -r  -elz4_dyn  data/
lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
Compressor name         Compress. Decompress. Compr. size  Ratio Filename
memcpy                   2203 MB/s  2218 MB/s        4096 100.00 data//data_1
lz4 1.8.0                 218 MB/s   693 MB/s        2228  54.39 data//data_1
lz4 1.8.0                 273 MB/s   851 MB/s        1739  42.46 data//data_2
lz4 1.8.0                 230 MB/s   526 MB/s        1800  43.95 data//data_3
lz4 1.8.0                 321 MB/s   952 MB/s        1357  33.13 data//data_4
lz4 1.8.0                 470 MB/s  1075 MB/s         664  16.21 data//data_5
lz4 1.8.0                 303 MB/s   964 MB/s        1455  35.52 data//data_6
lz4 1.8.0                 345 MB/s   951 MB/s        2126  51.90 data//data_7
lz4 1.8.0                 744 MB/s  1163 MB/s         177   4.32 data//data_8
lz4 1.8.0                 409 MB/s  1257 MB/s        1033  25.22 data//data_9
lz4 1.8.0                 220 MB/s   857 MB/s        3049  74.44 data//data_10
lz4 1.8.0                 464 MB/s  1105 MB/s         934  22.80 data//data_11
lz4 1.8.0                 874 MB/s  1194 MB/s          36   0.88 data//data_12


LZ4 Dynamic Offset with 32K data:
sh-3.2# ./lzbench -elz4_dyn data/data32k
lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
Compressor name         Compress. Decompress. Compr. size  Ratio Filename
memcpy                   5285 MB/s  5283 MB/s       32768 100.00 data/data32k
lz4 1.8.0                 274 MB/s   995 MB/s       13435  41.00 data/data32k
done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=1706MB cSpeed=0MB)

Original LZ4 with 32K data:
sh-3.2# ./lzbench_orig -elz4 data/data32k
lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
Compressor name         Compress. Decompress. Compr. size  Ratio Filename
memcpy                   4918 MB/s  5108 MB/s       32768 100.00 data/data32k
lz4 1.8.0                 276 MB/s  1045 MB/s       14492  44.23 data/data32k

LZO1x with 32K data (Default Compressor for ZRAM): 
sh-3.2# ./lzbench -elzo1x,1 data/data32k
lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
Compressor name         Compress. Decompress. Compr. size  Ratio Filename
memcpy                   5273 MB/s  5320 MB/s       32768 100.00 data/data32k
lzo1x 2.09 -1             283 MB/s   465 MB/s       14292  43.62 data/data32k

Regards,
Vaneet Narang

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
       [not found]     ` <CGME20180321044137epcas5p221e7ee4a0b7464eaa00dad8320f0251d@epcms5p6>
  2018-03-23 13:43       ` Vaneet Narang
@ 2018-03-29 10:26       ` Maninder Singh
  2018-03-30  5:41         ` Sergey Senozhatsky
  1 sibling, 1 reply; 23+ messages in thread
From: Maninder Singh @ 2018-03-29 10:26 UTC (permalink / raw)
  To: Vaneet Narang, Nick Terrell, Sergey Senozhatsky, herbert
  Cc: davem, minchan, ngupta, Kees Cook, anton, ccross, tony.luck,
	akpm, colin.king, linux-crypto, linux-kernel, PANKAJ MISHRA,
	AMIT SAHRAWAT, Yann Collet


Hello Nick/Sergey,
 
Any suggestion or comments, so that we can change code and resend the patch?
 
> Hi Nick / Sergey,
> 
> 
> We have compared LZ4 Dyn with Original LZ4 using some samples of realtime application data(4Kb)
> compressed/decompressed by ZRAM. For comparison we have used lzbench (https://github.com/inikep/lzbench)
> we have implemented dedicated LZ4 Dyn API & kept last literal length as 6 to avoid overhead 
> of checks. It seems in average case there is a saving of 3~4% in compression ratio with almost same compression
> speed and minor loss in decompression speed (~50MB/s) when compared with LZ4.
> 
> Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default compressor of ZRAM.
> 
> Original LZ4:
> sh-3.2# ./lzbench  -r  -elz4  data/
> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
> Compressor name         Compress. Decompress. Compr. size  Ratio Filename
> memcpy                   2205 MB/s  2217 MB/s        4096 100.00 data//data_1
> lz4 1.8.0                 216 MB/s   761 MB/s        2433  59.40 data//data_1
> lz4 1.8.0                 269 MB/s   877 MB/s        1873  45.73 data//data_2
> lz4 1.8.0                 238 MB/s   575 MB/s        2060  50.29 data//data_3
> lz4 1.8.0                 321 MB/s  1015 MB/s        1464  35.74 data//data_4
> lz4 1.8.0                 464 MB/s  1090 MB/s         713  17.41 data//data_5
> lz4 1.8.0                 296 MB/s   956 MB/s        1597  38.99 data//data_6
> lz4 1.8.0                 338 MB/s   994 MB/s        2238  54.64 data//data_7
> lz4 1.8.0                 705 MB/s  1172 MB/s         193   4.71 data//data_8
> lz4 1.8.0                 404 MB/s  1150 MB/s        1097  26.78 data//data_9
> lz4 1.8.0                 216 MB/s   921 MB/s        3183  77.71 data//data_10
> lz4 1.8.0                 456 MB/s  1101 MB/s        1011  24.68 data//data_11
> lz4 1.8.0                 867 MB/s  1202 MB/s          37   0.90 data//data_12
> 
> 
> LZ4 Dynamic Offet:  
> sh-3.2# ./lzbench  -r  -elz4_dyn  data/
> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
> Compressor name         Compress. Decompress. Compr. size  Ratio Filename
> memcpy                   2203 MB/s  2218 MB/s        4096 100.00 data//data_1
> lz4 1.8.0                 218 MB/s   693 MB/s        2228  54.39 data//data_1
> lz4 1.8.0                 273 MB/s   851 MB/s        1739  42.46 data//data_2
> lz4 1.8.0                 230 MB/s   526 MB/s        1800  43.95 data//data_3
> lz4 1.8.0                 321 MB/s   952 MB/s        1357  33.13 data//data_4
> lz4 1.8.0                 470 MB/s  1075 MB/s         664  16.21 data//data_5
> lz4 1.8.0                 303 MB/s   964 MB/s        1455  35.52 data//data_6
> lz4 1.8.0                 345 MB/s   951 MB/s        2126  51.90 data//data_7
> lz4 1.8.0                 744 MB/s  1163 MB/s         177   4.32 data//data_8
> lz4 1.8.0                 409 MB/s  1257 MB/s        1033  25.22 data//data_9
> lz4 1.8.0                 220 MB/s   857 MB/s        3049  74.44 data//data_10
> lz4 1.8.0                 464 MB/s  1105 MB/s         934  22.80 data//data_11
> lz4 1.8.0                 874 MB/s  1194 MB/s          36   0.88 data//data_12
> 
> 
> LZ4 Dynamic Offset with 32K data:
> sh-3.2# ./lzbench -elz4_dyn data/data32k
> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
> Compressor name         Compress. Decompress. Compr. size  Ratio Filename
> memcpy                   5285 MB/s  5283 MB/s       32768 100.00 data/data32k
> lz4 1.8.0                 274 MB/s   995 MB/s       13435  41.00 data/data32k
> done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=1706MB cSpeed=0MB)
> 
> Original LZ4 with 32K data:
> sh-3.2# ./lzbench_orig -elz4 data/data32k
> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
> Compressor name         Compress. Decompress. Compr. size  Ratio Filename
> memcpy                   4918 MB/s  5108 MB/s       32768 100.00 data/data32k
> lz4 1.8.0                 276 MB/s  1045 MB/s       14492  44.23 data/data32k
> 
> LZO1x with 32K data (Default Compressor for ZRAM): 
> sh-3.2# ./lzbench -elzo1x,1 data/data32k
> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
> Compressor name         Compress. Decompress. Compr. size  Ratio Filename
> memcpy                   5273 MB/s  5320 MB/s       32768 100.00 data/data32k
> lzo1x 2.09 -1             283 MB/s   465 MB/s       14292  43.62 data/data32k
> 
> Regards,
> Vaneet Narang
 
Thanks.
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
  2018-03-29 10:26       ` Maninder Singh
@ 2018-03-30  5:41         ` Sergey Senozhatsky
  0 siblings, 0 replies; 23+ messages in thread
From: Sergey Senozhatsky @ 2018-03-30  5:41 UTC (permalink / raw)
  To: Maninder Singh
  Cc: Vaneet Narang, Nick Terrell, Sergey Senozhatsky, herbert, davem,
	minchan, ngupta, Kees Cook, anton, ccross, tony.luck, akpm,
	colin.king, linux-crypto, linux-kernel, PANKAJ MISHRA,
	AMIT SAHRAWAT, Yann Collet

On (03/29/18 15:56), Maninder Singh wrote:
> Hello Nick/Sergey,
> 
> Any suggestion or comments, so that we can change code and resend the patch?

Well... there were no replies to
 https://marc.info/?l=linux-kernel&m=152161450026771&w=2
and
 https://marc.info/?l=linux-kernel&m=152161860627974&w=2

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.
       [not found]       ` <CGME20180321044149epcas5p12cba1afa5fc0f493a47fe20c4a7394bd@epcms5p5>
@ 2018-04-02  5:51         ` Maninder Singh
  2018-04-03 12:26           ` Sergey Senozhatsky
       [not found]           ` <CGME20180321044149epcas5p12cba1afa5fc0f493a47fe20c4a7394bd@epcms5p7>
  0 siblings, 2 replies; 23+ messages in thread
From: Maninder Singh @ 2018-04-02  5:51 UTC (permalink / raw)
  To: sergey.senozhatsky.work
  Cc: herbert, davem, minchan, ngupta, keescook, anton, ccross,
	tony.luck, akpm, colin.king, linux-crypto, linux-kernel,
	PANKAJ MISHRA, AMIT SAHRAWAT, Vaneet Narang

Hi,

>> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
>> index 4ed0a78..5bc5aab 100644
>> --- a/drivers/block/zram/zcomp.c
>> +++ b/drivers/block/zram/zcomp.c
>> @@ -17,11 +17,15 @@
>>  #include <linux/crypto.h>
>>  
>>  #include "zcomp.h"
>> +#define KB        (1 << 10)
>>  
>>  static const char * const backends[] = {
>>          "lzo",
>>  #if IS_ENABLED(CONFIG_CRYPTO_LZ4)
>>          "lz4",
>> +#if (PAGE_SIZE < (32 * KB))
>> +        "lz4_dyn",
>> +#endif
> 
>This is not the list of supported algorithms. It's the list of
>recommended algorithms. You can configure zram to use any of
>available and known to Crypto API algorithms. Including lz4_dyn
>on PAGE_SIZE > 32K systems.
> 
>        -ss
 
Yes, we want to integrate new compression(lz4_dyn) for ZRAM 
only if PAGE_SIZE is less than 32KB to get maximum benefit. 
so we added lz4_dyn to available list of ZRAM compression alhorithms.

Thanks,
Manider Singh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
       [not found]   ` <CGME20180321044137epcas5p221e7ee4a0b7464eaa00dad8320f0251d@epcms5p3>
@ 2018-04-02  6:03     ` Maninder Singh
  0 siblings, 0 replies; 23+ messages in thread
From: Maninder Singh @ 2018-04-02  6:03 UTC (permalink / raw)
  To: sergey.senozhatsky.work
  Cc: herbert, davem, minchan, ngupta, keescook, anton, ccross,
	tony.luck, akpm, colin.king, linux-crypto, linux-kernel,
	PANKAJ MISHRA, AMIT SAHRAWAT, Vaneet Narang


 Hello,

>> (Added cover letter to avoid much text in patch description)
>> 
>> LZ4 specification defines 2 byte offset length for 64 KB data.
>> But in case of ZRAM we compress data per page and in most of
>> architecture PAGE_SIZE is 4KB. So we can decide offset length based
>> on actual offset value. For this we can reserve 1 bit to decide offset
>> length (1 byte or 2 byte). 2 byte required only if ofsset is greater than 127,
>> else 1 byte is enough.
> 
>So what happens if I compress the data on a system with no dyn
>offset and then send it over the network to a machine which has
>dyn offset? Or, say, I have a USB stick with a compression enabled
>FS, store files on a dyn offset enabled PC and then mount that USB
>stick on a machine with no dyn offset support. And vice versa.


lz4_dyn is not an extension of LZ4 so there is no backward compatibility.
Consider this as a different algorithm adapted from LZ4 for better compression ratio.

Thanks
Maninder Singh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.
  2018-04-02  5:51         ` Maninder Singh
@ 2018-04-03 12:26           ` Sergey Senozhatsky
       [not found]           ` <CGME20180321044149epcas5p12cba1afa5fc0f493a47fe20c4a7394bd@epcms5p7>
  1 sibling, 0 replies; 23+ messages in thread
From: Sergey Senozhatsky @ 2018-04-03 12:26 UTC (permalink / raw)
  To: Maninder Singh
  Cc: sergey.senozhatsky.work, herbert, davem, minchan, ngupta,
	keescook, anton, ccross, tony.luck, akpm, colin.king,
	linux-crypto, linux-kernel, PANKAJ MISHRA, AMIT SAHRAWAT,
	Vaneet Narang

On (04/02/18 11:21), Maninder Singh wrote:
[..]
> >>  static const char * const backends[] = {
> >>          "lzo",
> >>  #if IS_ENABLED(CONFIG_CRYPTO_LZ4)
> >>          "lz4",
> >> +#if (PAGE_SIZE < (32 * KB))
> >> +        "lz4_dyn",
> >> +#endif
> > 
> >This is not the list of supported algorithms. It's the list of
> >recommended algorithms. You can configure zram to use any of
> >available and known to Crypto API algorithms. Including lz4_dyn
> >on PAGE_SIZE > 32K systems.
> > 
> Yes, we want to integrate new compression(lz4_dyn) for ZRAM 
> only if PAGE_SIZE is less than 32KB to get maximum benefit. 
> so we added lz4_dyn to available list of ZRAM compression alhorithms.

Which is not what I was talking about.

You shrink a 2 bytes offset down to a 1 byte offset, thus you enforce that
'page should be less than 32KB', which I'm sure will be confusing. And you
rely on lz4_dyn users to do the right thing - namely, to use that 'nice'
`#if (PAGE_SIZE < (32 * KB))'. Apart from that, lz4_dyn supports only data
in up to page_size chunks. Suppose my system has page_size of less than 32K,
so I legitimately can enable lz4_dyn, but suppose that I will use it
somewhere where I don't work with page_size-d chunks. Will I able to just
do tfm->compress(src, sz) on random buffers? The whole thing looks to be
quite fragile.

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.
       [not found]           ` <CGME20180321044149epcas5p12cba1afa5fc0f493a47fe20c4a7394bd@epcms5p7>
@ 2018-04-03 13:43             ` Vaneet Narang
  2018-04-04  1:40               ` Sergey Senozhatsky
  0 siblings, 1 reply; 23+ messages in thread
From: Vaneet Narang @ 2018-04-03 13:43 UTC (permalink / raw)
  To: Maninder Singh
  Cc: sergey.senozhatsky.work, herbert, davem, minchan, ngupta,
	keescook, anton, ccross, tony.luck, akpm, colin.king,
	linux-crypto, linux-kernel, PANKAJ MISHRA, AMIT SAHRAWAT

[-- Attachment #1: Type: text/plain, Size: 2646 bytes --]

Hi Sergey,

>You shrink a 2 bytes offset down to a 1 byte offset, thus you enforce that
2 Byte offset is not shrinked to 1 byte, Its only 1 bit is reserved out of
16 bits of offset. So only 15 Bits can be used to store offset value.

>'page should be less than 32KB', which I'm sure will be confusing. 
lz4_dyn will work on bigger data length(> 32k) but in that case compression
ratio may not be better than LZ4. This is same as LZ4 compressing data more 
than 64K (16Bits). LZ4 can't store offset more than 64K similarly
LZ4 dyn can't store offset more than 32K.

There is a handling in LZ4 code for this and similar handling added for LZ4 Dyn.

Handling in LZ4 Dyn: max_distance is 32K for lz4_dyn and will be 64K for LZ4
int max_distance = dynOffset ? MAX_DISTANCE_DYN : MAX_DISTANCE;

>And you
>rely on lz4_dyn users to do the right thing - namely, to use that 'nice'
>`#if (PAGE_SIZE < (32 * KB))'. 
They don't need to add this code, they just need to choose right compression algorithm 
that fits their requirement. If source length is less than 32K then lz4_dyn 
would give better compression ratio then LZ4. 

Considering ZRAM as a user for LZ4 dyn, we have added this check for PAGE_SIZE which
is source length. This code adds lz4 dyn to preferred list of compression algorithm 
when PAGE size is less than 32K. 

>Apart from that, lz4_dyn supports only data
>in up to page_size chunks. Suppose my system has page_size of less than 32K,
>so I legitimately can enable lz4_dyn, but suppose that I will use it
>somewhere where I don't work with page_size-d chunks. Will I able to just
>do tfm->compress(src, sz) on random buffers? The whole thing looks to be
>quite fragile.
No thats not true, lz4_dyn can work for random buffers and it need not be 
of page size chunks. There is no difference in Lz4 and Lz4 dyn working. 

Only difference is LZ4 dyn doesn't use fixed offset size, this concept already getting 
used in LZO which uses dynamic size of Metadata based on Match Length and Match offset.
It uses different markers for this which defines length of meta data.

lzodefs.h:

#define M1_MAX_OFFSET   0x0400
#define M2_MAX_OFFSET   0x0800
#define M3_MAX_OFFSET   0x4000
#define M4_MAX_OFFSET   0xbfff

#define M1_MIN_LEN      2
#define M1_MAX_LEN      2
#define M2_MIN_LEN      3
#define M2_MAX_LEN      8
#define M3_MIN_LEN      3
#define M3_MAX_LEN      33
#define M4_MIN_LEN      3
#define M4_MAX_LEN      9

#define M1_MARKER       0
#define M2_MARKER       64
#define M3_MARKER       32
#define M4_MARKER       16

Similarly for LZ4 Dyn, we have used 1 bit as a marker to determine offset length.

Thanks & Regards,
Vaneet Narang

[-- Attachment #2: rcptInfo.txt --]
[-- Type: application/octet-stream, Size: 1928 bytes --]


   =================================================================================================================================
      Subject    : Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.
      From       : null
      Sent Date  : 2018-04-03 17:56  GMT+5:30
   =================================================================================================================================
                  Name                Type          Job Title                       Dept.                               Company                
   =================================================================================================================================
      Maninder Singh                 TO         Engineer                   SRI-Delhi-Platform S/W 1 Team             Samsung Electronics 
      sergey.senozhatsky.work@g...   CC
      herbert@gondor.apana.org.au    CC
      davem@davemloft.net            CC
      minchan@kernel.org             CC
      ngupta@vflare.org              CC
      keescook@chromium.org          CC
      anton@enomsg.org               CC
      ccross@android.com             CC
      tony.luck@intel.com            CC
      akpm@linux-foundation.org      CC
      colin.king@canonical.com       CC
      linux-crypto@vger.kernel.org   CC
      linux-kernel@vger.kernel.org   CC
      PANKAJ MISHRA                  CC         Principal Professional     SRI-Delhi-Platform S/W 1 Team             Samsung Electronics
      AMIT SAHRAWAT                  CC         Staff Engineer             SRI-Delhi-Platform S/W 1 Team             Samsung Electronics
      Vaneet Narang                  CC         Staff Engineer             SRI-Delhi-Platform S/W 1 Team             Samsung Electronics
   =================================================================================================================================

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.
  2018-04-03 13:43             ` Vaneet Narang
@ 2018-04-04  1:40               ` Sergey Senozhatsky
  0 siblings, 0 replies; 23+ messages in thread
From: Sergey Senozhatsky @ 2018-04-04  1:40 UTC (permalink / raw)
  To: Vaneet Narang
  Cc: Maninder Singh, sergey.senozhatsky.work, herbert, davem, minchan,
	ngupta, keescook, anton, ccross, tony.luck, akpm, colin.king,
	linux-crypto, linux-kernel, PANKAJ MISHRA, AMIT SAHRAWAT

On (04/03/18 19:13), Vaneet Narang wrote:
> Hi Sergey,
> 
> >You shrink a 2 bytes offset down to a 1 byte offset, thus you enforce that
> 2 Byte offset is not shrinked to 1 byte, Its only 1 bit is reserved out of
> 16 bits of offset. So only 15 Bits can be used to store offset value.

Yes, you are right. My bad, was thinking about something else.

> >'page should be less than 32KB', which I'm sure will be confusing. 
> lz4_dyn will work on bigger data length(> 32k) but in that case compression
> ratio may not be better than LZ4. This is same as LZ4 compressing data more 
> than 64K (16Bits). LZ4 can't store offset more than 64K similarly
> LZ4 dyn can't store offset more than 32K.

Then drop that `if PAGE_SIZE' thing. I'd rather do that stuff internally
in lz4... if it needed at all.

> >And you
> >rely on lz4_dyn users to do the right thing - namely, to use that 'nice'
> >`#if (PAGE_SIZE < (32 * KB))'. 
> They don't need to add this code

Then drop it.

> >Apart from that, lz4_dyn supports only data
> >in up to page_size chunks. Suppose my system has page_size of less than 32K,
> >so I legitimately can enable lz4_dyn, but suppose that I will use it
> >somewhere where I don't work with page_size-d chunks. Will I able to just
> >do tfm->compress(src, sz) on random buffers? The whole thing looks to be
> >quite fragile.
> No thats not true, lz4_dyn can work for random buffers and it need not be 
> of page size chunks. There is no difference in Lz4 and Lz4 dyn working. 

You are right.

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
       [not found]     ` <CGME20180321044137epcas5p221e7ee4a0b7464eaa00dad8320f0251d@epcms5p4>
@ 2018-04-16 10:21       ` Maninder Singh
  2018-04-16 19:34         ` Yann Collet
  0 siblings, 1 reply; 23+ messages in thread
From: Maninder Singh @ 2018-04-16 10:21 UTC (permalink / raw)
  To: Maninder Singh, Vaneet Narang, Nick Terrell, Sergey Senozhatsky,
	herbert, Yann Collet
  Cc: davem, minchan, ngupta, Kees Cook, anton, ccross, tony.luck,
	akpm, colin.king, linux-crypto, linux-kernel, PANKAJ MISHRA,
	AMIT SAHRAWAT, gregkh


 Hello Nick/ Yann,\r
\r
Any inputs regarding LZ4 dyn results & lz4 dyn approach.\r
\r
>Hello Nick/Sergey,\r
> \r
>Any suggestion or comments, so that we can change code and resend the patch?\r
> \r
>> Hi Nick / Sergey,\r
>> \r
>> \r
>> We have compared LZ4 Dyn with Original LZ4 using some samples of realtime application data(4Kb)\r
>> compressed/decompressed by ZRAM. For comparison we have used lzbench (https://github.com/inikep/lzbench)\r
>> we have implemented dedicated LZ4 Dyn API & kept last literal length as 6 to avoid overhead \r
>> of checks. It seems in average case there is a saving of 3~4% in compression ratio with almost same compression\r
>> speed and minor loss in decompression speed (~50MB/s) when compared with LZ4.\r
>> \r
>> Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default compressor of ZRAM.\r
>> \r
>> Original LZ4:\r
>> sh-3.2# ./lzbench  -r  -elz4  data/\r
>> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski\r
>> Compressor name         Compress. Decompress. Compr. size  Ratio Filename\r
>> memcpy                   2205 MB/s  2217 MB/s        4096 100.00 data//data_1\r
>> lz4 1.8.0                 216 MB/s   761 MB/s        2433  59.40 data//data_1\r
>> lz4 1.8.0                 269 MB/s   877 MB/s        1873  45.73 data//data_2\r
>> lz4 1.8.0                 238 MB/s   575 MB/s        2060  50.29 data//data_3\r
>> lz4 1.8.0                 321 MB/s  1015 MB/s        1464  35.74 data//data_4\r
>> lz4 1.8.0                 464 MB/s  1090 MB/s         713  17.41 data//data_5\r
>> lz4 1.8.0                 296 MB/s   956 MB/s        1597  38.99 data//data_6\r
>> lz4 1.8.0                 338 MB/s   994 MB/s        2238  54.64 data//data_7\r
>> lz4 1.8.0                 705 MB/s  1172 MB/s         193   4.71 data//data_8\r
>> lz4 1.8.0                 404 MB/s  1150 MB/s        1097  26.78 data//data_9\r
>> lz4 1.8.0                 216 MB/s   921 MB/s        3183  77.71 data//data_10\r
>> lz4 1.8.0                 456 MB/s  1101 MB/s        1011  24.68 data//data_11\r
>> lz4 1.8.0                 867 MB/s  1202 MB/s          37   0.90 data//data_12\r
>> \r
>> \r
>> LZ4 Dynamic Offet:  \r
>> sh-3.2# ./lzbench  -r  -elz4_dyn  data/\r
>> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski\r
>> Compressor name         Compress. Decompress. Compr. size  Ratio Filename\r
>> memcpy                   2203 MB/s  2218 MB/s        4096 100.00 data//data_1\r
>> lz4 1.8.0                 218 MB/s   693 MB/s        2228  54.39 data//data_1\r
>> lz4 1.8.0                 273 MB/s   851 MB/s        1739  42.46 data//data_2\r
>> lz4 1.8.0                 230 MB/s   526 MB/s        1800  43.95 data//data_3\r
>> lz4 1.8.0                 321 MB/s   952 MB/s        1357  33.13 data//data_4\r
>> lz4 1.8.0                 470 MB/s  1075 MB/s         664  16.21 data//data_5\r
>> lz4 1.8.0                 303 MB/s   964 MB/s        1455  35.52 data//data_6\r
>> lz4 1.8.0                 345 MB/s   951 MB/s        2126  51.90 data//data_7\r
>> lz4 1.8.0                 744 MB/s  1163 MB/s         177   4.32 data//data_8\r
>> lz4 1.8.0                 409 MB/s  1257 MB/s        1033  25.22 data//data_9\r
>> lz4 1.8.0                 220 MB/s   857 MB/s        3049  74.44 data//data_10\r
>> lz4 1.8.0                 464 MB/s  1105 MB/s         934  22.80 data//data_11\r
>> lz4 1.8.0                 874 MB/s  1194 MB/s          36   0.88 data//data_12\r
>> \r
>> \r
>> LZ4 Dynamic Offset with 32K data:\r
>> sh-3.2# ./lzbench -elz4_dyn data/data32k\r
>> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski\r
>> Compressor name         Compress. Decompress. Compr. size  Ratio Filename\r
>> memcpy                   5285 MB/s  5283 MB/s       32768 100.00 data/data32k\r
>> lz4 1.8.0                 274 MB/s   995 MB/s       13435  41.00 data/data32k\r
>> done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=1706MB cSpeed=0MB)\r
>> \r
>> Original LZ4 with 32K data:\r
>> sh-3.2# ./lzbench_orig -elz4 data/data32k\r
>> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski\r
>> Compressor name         Compress. Decompress. Compr. size  Ratio Filename\r
>> memcpy                   4918 MB/s  5108 MB/s       32768 100.00 data/data32k\r
>> lz4 1.8.0                 276 MB/s  1045 MB/s       14492  44.23 data/data32k\r
>> \r
>> LZO1x with 32K data (Default Compressor for ZRAM): \r
>> sh-3.2# ./lzbench -elzo1x,1 data/data32k\r
>> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski\r
>> Compressor name         Compress. Decompress. Compr. size  Ratio Filename\r
>> memcpy                   5273 MB/s  5320 MB/s       32768 100.00 data/data32k\r
>> lzo1x 2.09 -1             283 MB/s   465 MB/s       14292  43.62 data/data32k\r
\r
\r
Thanks,\r
Maninder Singh\r
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
  2018-04-16 10:21       ` Maninder Singh
@ 2018-04-16 19:34         ` Yann Collet
  2018-04-16 20:01           ` Eric Biggers
  0 siblings, 1 reply; 23+ messages in thread
From: Yann Collet @ 2018-04-16 19:34 UTC (permalink / raw)
  To: maninder1.s, Vaneet Narang, Nick Terrell, Sergey Senozhatsky, herbert
  Cc: davem, minchan, ngupta, Kees Cook, anton, ccross, tony.luck,
	akpm, colin.king, linux-crypto, linux-kernel, PANKAJ MISHRA,
	AMIT SAHRAWAT, gregkh

Hi Singh

I don't have any strong opinion on this topic.

You made your case clear: 
your variant trades a little bit of speed for a little bit more compression ratio.
In the context of zram, it makes sense, and I would expect it to work, as advertised in your benchmark results.
(disclaimer: I haven't reproduced these results, just, they look reasonable to me, I have no reason to doubt them).

So, the issue is less about performance, than about code complexity.

As mentioned, this is an incompatible variant.
So, it requires its own entry point, and preferably its own code path
(even if it's heavily duplicated,
mixing it with regular lz4 source code, as proposed in the patch, will be bad for maintenance, 
and can negatively impact regular lz4 usage, outside of zram).

So that's basically the "cost" of adding this option.

Is it worth it?
Well, this is completely outside of my responsibility area, so I really can't tell.
You'll have to convince people in charge that the gains are worth their complexity,
since _they_ will inherit the duty to keep the system working through its future evolutions.
At a minimum, you are targeting maintainers of zram and the crypto interface.
For this topic, they are the right people to talk to.


On 4/16/18, 04:09, "Maninder Singh" <maninder1.s@samsung.com> wrote:

    
     Hello Nick/ Yann,
    
    Any inputs regarding LZ4 dyn results & lz4 dyn approach.
    
    >Hello Nick/Sergey,
    > 
    >Any suggestion or comments, so that we can change code and resend the patch?
    > 
    >> Hi Nick / Sergey,
    >> 
    >> 
    >> We have compared LZ4 Dyn with Original LZ4 using some samples of realtime application data(4Kb)
    >> compressed/decompressed by ZRAM. For comparison we have used lzbench (https://github.com/inikep/lzbench)
    >> we have implemented dedicated LZ4 Dyn API & kept last literal length as 6 to avoid overhead 
    >> of checks. It seems in average case there is a saving of 3~4% in compression ratio with almost same compression
    >> speed and minor loss in decompression speed (~50MB/s) when compared with LZ4.
    >> 
    >> Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default compressor of ZRAM.
    >> 
    >> Original LZ4:
    >> sh-3.2# ./lzbench  -r  -elz4  data/
    >> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
    >> Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    >> memcpy                   2205 MB/s  2217 MB/s        4096 100.00 data//data_1
    >> lz4 1.8.0                 216 MB/s   761 MB/s        2433  59.40 data//data_1
    >> lz4 1.8.0                 269 MB/s   877 MB/s        1873  45.73 data//data_2
    >> lz4 1.8.0                 238 MB/s   575 MB/s        2060  50.29 data//data_3
    >> lz4 1.8.0                 321 MB/s  1015 MB/s        1464  35.74 data//data_4
    >> lz4 1.8.0                 464 MB/s  1090 MB/s         713  17.41 data//data_5
    >> lz4 1.8.0                 296 MB/s   956 MB/s        1597  38.99 data//data_6
    >> lz4 1.8.0                 338 MB/s   994 MB/s        2238  54.64 data//data_7
    >> lz4 1.8.0                 705 MB/s  1172 MB/s         193   4.71 data//data_8
    >> lz4 1.8.0                 404 MB/s  1150 MB/s        1097  26.78 data//data_9
    >> lz4 1.8.0                 216 MB/s   921 MB/s        3183  77.71 data//data_10
    >> lz4 1.8.0                 456 MB/s  1101 MB/s        1011  24.68 data//data_11
    >> lz4 1.8.0                 867 MB/s  1202 MB/s          37   0.90 data//data_12
    >> 
    >> 
    >> LZ4 Dynamic Offet:  
    >> sh-3.2# ./lzbench  -r  -elz4_dyn  data/
    >> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
    >> Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    >> memcpy                   2203 MB/s  2218 MB/s        4096 100.00 data//data_1
    >> lz4 1.8.0                 218 MB/s   693 MB/s        2228  54.39 data//data_1
    >> lz4 1.8.0                 273 MB/s   851 MB/s        1739  42.46 data//data_2
    >> lz4 1.8.0                 230 MB/s   526 MB/s        1800  43.95 data//data_3
    >> lz4 1.8.0                 321 MB/s   952 MB/s        1357  33.13 data//data_4
    >> lz4 1.8.0                 470 MB/s  1075 MB/s         664  16.21 data//data_5
    >> lz4 1.8.0                 303 MB/s   964 MB/s        1455  35.52 data//data_6
    >> lz4 1.8.0                 345 MB/s   951 MB/s        2126  51.90 data//data_7
    >> lz4 1.8.0                 744 MB/s  1163 MB/s         177   4.32 data//data_8
    >> lz4 1.8.0                 409 MB/s  1257 MB/s        1033  25.22 data//data_9
    >> lz4 1.8.0                 220 MB/s   857 MB/s        3049  74.44 data//data_10
    >> lz4 1.8.0                 464 MB/s  1105 MB/s         934  22.80 data//data_11
    >> lz4 1.8.0                 874 MB/s  1194 MB/s          36   0.88 data//data_12
    >> 
    >> 
    >> LZ4 Dynamic Offset with 32K data:
    >> sh-3.2# ./lzbench -elz4_dyn data/data32k
    >> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
    >> Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    >> memcpy                   5285 MB/s  5283 MB/s       32768 100.00 data/data32k
    >> lz4 1.8.0                 274 MB/s   995 MB/s       13435  41.00 data/data32k
    >> done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=1706MB cSpeed=0MB)
    >> 
    >> Original LZ4 with 32K data:
    >> sh-3.2# ./lzbench_orig -elz4 data/data32k
    >> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
    >> Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    >> memcpy                   4918 MB/s  5108 MB/s       32768 100.00 data/data32k
    >> lz4 1.8.0                 276 MB/s  1045 MB/s       14492  44.23 data/data32k
    >> 
    >> LZO1x with 32K data (Default Compressor for ZRAM): 
    >> sh-3.2# ./lzbench -elzo1x,1 data/data32k
    >> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
    >> Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    >> memcpy                   5273 MB/s  5320 MB/s       32768 100.00 data/data32k
    >> lzo1x 2.09 -1             283 MB/s   465 MB/s       14292  43.62 data/data32k
    
    
    Thanks,
    Maninder Singh
     
    


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
  2018-04-16 19:34         ` Yann Collet
@ 2018-04-16 20:01           ` Eric Biggers
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Biggers @ 2018-04-16 20:01 UTC (permalink / raw)
  To: Yann Collet
  Cc: maninder1.s, Vaneet Narang, Nick Terrell, Sergey Senozhatsky,
	herbert, davem, minchan, ngupta, Kees Cook, anton, ccross,
	tony.luck, akpm, colin.king, linux-crypto, linux-kernel,
	PANKAJ MISHRA, AMIT SAHRAWAT, gregkh

On Mon, Apr 16, 2018 at 07:34:29PM +0000, Yann Collet wrote:
> Hi Singh
> 
> I don't have any strong opinion on this topic.
> 
> You made your case clear: 
> your variant trades a little bit of speed for a little bit more compression ratio.
> In the context of zram, it makes sense, and I would expect it to work, as advertised in your benchmark results.
> (disclaimer: I haven't reproduced these results, just, they look reasonable to me, I have no reason to doubt them).
> 
> So, the issue is less about performance, than about code complexity.
> 
> As mentioned, this is an incompatible variant.
> So, it requires its own entry point, and preferably its own code path
> (even if it's heavily duplicated,
> mixing it with regular lz4 source code, as proposed in the patch, will be bad for maintenance, 
> and can negatively impact regular lz4 usage, outside of zram).
> 
> So that's basically the "cost" of adding this option.
> 
> Is it worth it?
> Well, this is completely outside of my responsibility area, so I really can't tell.
> You'll have to convince people in charge that the gains are worth their complexity,
> since _they_ will inherit the duty to keep the system working through its future evolutions.
> At a minimum, you are targeting maintainers of zram and the crypto interface.
> For this topic, they are the right people to talk to.
> 
> 
> On 4/16/18, 04:09, "Maninder Singh" <maninder1.s@samsung.com> wrote:
> 
>     
>      Hello Nick/ Yann,
>     
>     Any inputs regarding LZ4 dyn results & lz4 dyn approach.
>     
>     >Hello Nick/Sergey,
>     > 
>     >Any suggestion or comments, so that we can change code and resend the patch?
>     > 
>     >> Hi Nick / Sergey,
>     >> 
>     >> 
>     >> We have compared LZ4 Dyn with Original LZ4 using some samples of realtime application data(4Kb)
>     >> compressed/decompressed by ZRAM. For comparison we have used lzbench (https://github.com/inikep/lzbench)
>     >> we have implemented dedicated LZ4 Dyn API & kept last literal length as 6 to avoid overhead 
>     >> of checks. It seems in average case there is a saving of 3~4% in compression ratio with almost same compression
>     >> speed and minor loss in decompression speed (~50MB/s) when compared with LZ4.
>     >> 
>     >> Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default compressor of ZRAM.
>     >> 

Unfortunately the track record of maintaining compression code in the Linux
kernel is not great.  zlib for example was forked from v1.2.3, which was
released in 2005, and hasn't been updated since besides some random drive-by
patches which have made it diverge even further from upstream.  There have even
been bugs assigned CVE numbers in upstream zlib, and I don't think anyone has
looked at whether the Linux kernel version has those bugs or not.

The story with LZ4 is a bit better as someone updated it to v1.7.3 last year.
But, it took a lot of rounds of review in which I had to point out some subtle
regressions like the hash table size being accidentally changed, and things not
being inlined that should be.  And of course now that version is outdated
already.

We also have LZO and Zstandard in the kernel to maintain too, as well as XZ
decompression.

And very problematically, *none* of these compression algorithms have a
maintainer listed in the MAINTAINERS file.

So in my opinion, as a prerequisite for this change, someone would need to
volunteer to actually maintain LZ4 in the kernel.

Thanks,

Eric

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-04-16 20:01 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20180321044137epcas5p221e7ee4a0b7464eaa00dad8320f0251d@epcas5p2.samsung.com>
2018-03-21  4:40 ` [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length Maninder Singh
     [not found]   ` <CGME20180321044149epcas5p12cba1afa5fc0f493a47fe20c4a7394bd@epcas5p1.samsung.com>
2018-03-21  4:40     ` [PATCH 1/1] lz4: " Maninder Singh
2018-03-21  7:49       ` Sergey Senozhatsky
2018-03-21 19:59       ` Nick Terrell
     [not found]       ` <CGME20180321195916epcas4p1e89e1cc5fd242ee8c348f92f1d1740b0@epcms5p1>
2018-03-22  4:28         ` Maninder Singh
2018-03-22 23:09       ` kbuild test robot
2018-03-22 23:32       ` kbuild test robot
     [not found]       ` <CGME20180321195916epcas4p1e89e1cc5fd242ee8c348f92f1d1740b0@epcms5p8>
2018-03-23 13:21         ` Vaneet Narang
     [not found]       ` <CGME20180321044149epcas5p12cba1afa5fc0f493a47fe20c4a7394bd@epcms5p5>
2018-04-02  5:51         ` Maninder Singh
2018-04-03 12:26           ` Sergey Senozhatsky
     [not found]           ` <CGME20180321044149epcas5p12cba1afa5fc0f493a47fe20c4a7394bd@epcms5p7>
2018-04-03 13:43             ` Vaneet Narang
2018-04-04  1:40               ` Sergey Senozhatsky
2018-03-21  6:41   ` [PATCH 0/1] cover-letter/lz4: " Sergey Senozhatsky
2018-03-21  8:26   ` Sergey Senozhatsky
2018-03-21 19:56     ` Nick Terrell
2018-03-22  2:43       ` Sergey Senozhatsky
     [not found]     ` <CGME20180321044137epcas5p221e7ee4a0b7464eaa00dad8320f0251d@epcms5p6>
2018-03-23 13:43       ` Vaneet Narang
2018-03-29 10:26       ` Maninder Singh
2018-03-30  5:41         ` Sergey Senozhatsky
     [not found]     ` <CGME20180321044137epcas5p221e7ee4a0b7464eaa00dad8320f0251d@epcms5p4>
2018-04-16 10:21       ` Maninder Singh
2018-04-16 19:34         ` Yann Collet
2018-04-16 20:01           ` Eric Biggers
     [not found]   ` <CGME20180321044137epcas5p221e7ee4a0b7464eaa00dad8320f0251d@epcms5p3>
2018-04-02  6:03     ` Maninder Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).