LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC PATCH 0/7] rslib: RS decoder is severely broken
@ 2019-03-30 18:29 Ferdinand Blomqvist
  2019-03-30 18:29 ` [RFC PATCH 1/7] rslib: Add tests for the encoder and decoder Ferdinand Blomqvist
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Ferdinand Blomqvist @ 2019-03-30 18:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner

The Reed_Solomon library used in the kernel is based on Phil Karn's
fec library. When playing with this library I found a couple of bugs. It
turn out that all of these bugs, and some additional flaws, are present
in rslib.

The Reed-Solomon decoder has several bugs/flaws:

- Decoding of shortened codes is broken.

- The decoder sometimes fails silently, i.e. it announces success but
  returns a word that is not a codeword.

- The return value of the decoder is incoherent with respect to how
  fixed erasures are counted. If the word to be decoded is a codeword,
  then the decoder always returns zero even if some erasures are given.
  On the other hand, if the word to be decoded contains errors, then the
  number of erasures is always included in the count of corrected
  symbols. So the decoder handles erasures without symbol corruption
  inconsistently. This inconsistency probably doesn't affect anyone
  using the decoder, but it is inconsistent with the documentation.

- The error positions returned in eras_pos include all erasures, but the
  corrections are only set in the correction buffer if there actually is
  a symbol error. So if there are erasures without symbol corruption,
  then the correction buffer will contain errors (unless initialized to
  zero before calling the decoder) or some values will be unset (if the
  correction buffer is uninitialized).

- Assumes that the syndromes provided by callers are in index form.
  This is simply undocumented.

- When correcting data in-place the decoder does not correct errors in
  the parity. On the other hand, when returning the errors in correction
  buffers, errors in the parity are included.

This series provides a module with tests for rslib and fixes all the
bugs/flaws. I am not sure that the provided self tests are written in
the 'right way'. I just looked at other self tests in lib and
implemented something similar.

The fixes are tested with the self tests. They should probably also be
tested with drivers etc. that use rslib, but it is unclear to me how to
do this.

I looked a bit on two of the drivers that use rslib:

drivers/mtd/nand/raw/cafe_nand.c
drivers/mtd/nand/raw/diskonchip.c

Both of them seem to do some additional error checking after calling
decode_rs16. Maybe this is needed because of the bugs in the decoder?

Ferdinand Blomqvist (7):
  rslib: Add tests for the encoder and decoder
  rslib: Fix decoding of shortened codes
  rslib: decode_rs: Fix length parameter check
  rslib: decode_rs: code cleanup
  rslib: Fix handling of of caller provided syndrome
  rslib: Update documentation
  rslib: Fix remaining decoder flaws

 lib/Kconfig.debug               |  12 +
 lib/reed_solomon/Makefile       |   2 +-
 lib/reed_solomon/decode_rs.c    | 112 +++++--
 lib/reed_solomon/reed_solomon.c |  12 +-
 lib/reed_solomon/test_rslib.c   | 519 ++++++++++++++++++++++++++++++++
 5 files changed, 622 insertions(+), 35 deletions(-)
 create mode 100644 lib/reed_solomon/test_rslib.c

-- 
2.17.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 1/7] rslib: Add tests for the encoder and decoder
  2019-03-30 18:29 [RFC PATCH 0/7] rslib: RS decoder is severely broken Ferdinand Blomqvist
@ 2019-03-30 18:29 ` Ferdinand Blomqvist
  2019-06-17 20:44   ` Thomas Gleixner
  2019-03-30 18:29 ` [RFC PATCH 2/7] rslib: Fix decoding of shortened codes Ferdinand Blomqvist
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Ferdinand Blomqvist @ 2019-03-30 18:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner

A Reed-Solomon code with minimum distance d can correct any error and
erasure pattern that satisfies 2 * #error + #erasures < d. If the
error correction capacity is exceeded, then correct decoding cannot be
guaranteed. The decoder must, however, return a valid codeword or report
failure.

There are two main tests:

- Check for correct behaviour up to the error correction capacity
- Check for correct behaviour beyond error corrupted capacity

Both tests are simple:

1. Generate random data
2. Encode data with the chosen code
3. Add errors and erasures to data
4. Decode the corrupted word
5. Check for correct behaviour

When testing up to capacity we test for:

- Correct decoding
- Correct return value (i.e. the number of corrected symbols)
- That the returned error positions are correct

There are two kinds of erasures; the erased symbol can be corrupted or
not. When counting the number of corrected symbols, erasures without
symbol corruption should not be counted. Similarly, the returned error
positions should only include positions where a correction is necessary.

We run the up to capacity tests for three different interfaces of
decode_rs:

- Use the correction buffers
- Use the correction buffers with syndromes provided by the caller
- Error correction in place (does not check the error positions)

When testing beyond capacity we test for silent failures. A silent
failure is when the decoder returns success but the returned word is not
a valid codeword.

There are a couple of options for the tests:

- Verbosity.

- Whether to test for correct behaviour beyond capacity. Default is to
  test beyond capacity.

- Whether to allow erasures without symbol corruption. Defaults to yes.

Note that the tests take a couple of minutes to complete.

Signed-off-by: Ferdinand Blomqvist <ferdinand.blomqvist@gmail.com>
---
 lib/Kconfig.debug             |  12 +
 lib/reed_solomon/Makefile     |   2 +-
 lib/reed_solomon/test_rslib.c | 519 ++++++++++++++++++++++++++++++++++
 3 files changed, 532 insertions(+), 1 deletion(-)
 create mode 100644 lib/reed_solomon/test_rslib.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 0d9e81779e37..48f43583b3b6 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1726,6 +1726,18 @@ config RBTREE_TEST
 	  A benchmark measuring the performance of the rbtree library.
 	  Also includes rbtree invariant checks.
 
+config REED_SOLOMON_TEST
+	tristate "Reed-Solomon library test"
+	depends on DEBUG_KERNEL || m
+	select REED_SOLOMON
+	select REED_SOLOMON_ENC16
+	select REED_SOLOMON_DEC16
+	help
+	  This option enables the self-test function of rslib at boot,
+	  or at module load time.
+
+	  If unsure, say N.
+
 config INTERVAL_TREE_TEST
 	tristate "Interval tree test"
 	depends on DEBUG_KERNEL
diff --git a/lib/reed_solomon/Makefile b/lib/reed_solomon/Makefile
index c3d7136827ed..e84ce076c6e5 100644
--- a/lib/reed_solomon/Makefile
+++ b/lib/reed_solomon/Makefile
@@ -3,4 +3,4 @@
 #
 
 obj-$(CONFIG_REED_SOLOMON) += reed_solomon.o
-
+obj-$(CONFIG_REED_SOLOMON_TEST) += test_rslib.o
diff --git a/lib/reed_solomon/test_rslib.c b/lib/reed_solomon/test_rslib.c
new file mode 100644
index 000000000000..c97b1dbadd4e
--- /dev/null
+++ b/lib/reed_solomon/test_rslib.c
@@ -0,0 +1,519 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Tests for Generic Reed Solomon encoder / decoder library
+ *
+ * Written by Ferdinand Blomqvist
+ * Based on previous work by Phil Karn, KA9Q
+ */
+#include <linux/rslib.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+
+enum verbosity {
+	V_SILENT,
+	V_PROGRESS,
+	V_CSUMMARY
+};
+
+enum method {
+	CORR_BUFFER,
+	CALLER_SYNDROME,
+	IN_PLACE
+};
+
+#define __param(type, name, init, msg)		\
+	static type name = init;		\
+	module_param(name, type, 0444);		\
+	MODULE_PARM_DESC(name, msg)
+
+__param(int, v, V_PROGRESS, "Verbosity level");
+__param(int, ewsc, 1, "Erasures without symbol corruption");
+__param(int, bc, 1, "Test for correct behaviour beyond error correction capacity");
+
+struct etab {
+	int symsize;
+	int genpoly;
+	int fcs;
+	int prim;
+	int nroots;
+	int ntrials;
+};
+
+/* List of codes to test */
+static struct etab Tab[] = {
+	{2,	0x7,	1,	1,	1,	100000	},
+	{3,	0xb,	1,	1,	2,	100000	},
+	{3,	0xb,	1,	1,	3,	100000	},
+	{3,	0xb,	2,	1,	4,	100000	},
+	{4,	0x13,	1,	1,	4,	10000	},
+	{5,	0x25,	1,	1,	6,	1000	},
+	{6,	0x43,	3,	1,	8,	100	},
+	{7,	0x89,	1,	1,	10,	100	},
+	{8,	0x11d,	1,	1,	32,	100	},
+	{8,	0x187,	112,	11,	32,	100	},
+	/*
+	 * {9,	0x211,	1,	1,	32,	100	},
+	 * {10,	0x409,	1,	1,	32,	100	},
+	 * {11,	0x805,	1,	1,	32,	100	},
+	 * {12,	0x1053	1,	1,	32,	50	},
+	 * {12,	0x1053	1,	1,	64,	50	},
+	 * {13,	0x201b	1,	1,	32,	20	},
+	 * {13,	0x201b	1,	1,	64,	20	},
+	 * {14,	0x4443	1,	1,	32,	10	},
+	 * {14,	0x4443	1,	1,	64,	10	},
+	 * {15,	0x8003	1,	1,	32,	5	},
+	 * {15,	0x8003	1,	1,	64,	5	},
+	 * {16,	0x1100	1,	1,	32,	5	},
+	 */
+	{0, 0, 0, 0, 0, 0},
+};
+
+
+struct estat {
+	int dwrong;
+	int irv;
+	int wepos;
+	int nwords;
+};
+
+struct bcstat {
+	int rfail;
+	int rsuccess;
+	int noncw;
+	int nwords;
+};
+
+struct wspace {
+	uint16_t *c;		/* sent codeword */
+	uint16_t *r;		/* received word */
+	uint16_t *s;		/* syndrome */
+	uint16_t *corr;		/* correction buffer */
+	int *errlocs;
+	int *derrlocs;
+};
+
+static double Pad[] = {0, 0.25, 0.5, 0.75, 1.0};
+
+static struct wspace *alloc_ws(struct rs_codec *rs)
+{
+	int nn = rs->nn;
+	int nroots = rs->nroots;
+	struct wspace *ws;
+
+	ws = kzalloc(sizeof(*ws), GFP_KERNEL);
+	if (!ws)
+		return NULL;
+
+	ws->c = kmalloc_array(2 * (nn + nroots),
+				sizeof(uint16_t), GFP_KERNEL);
+	if (!ws->c)
+		goto err;
+
+	ws->r = ws->c + nn;
+	ws->s = ws->r + nn;
+	ws->corr = ws->s + nroots;
+
+	ws->errlocs = kmalloc_array(nn + nroots, sizeof(int), GFP_KERNEL);
+	if (!ws->c)
+		goto err;
+
+	ws->derrlocs = ws->errlocs + nn;
+	return ws;
+
+err:
+	kfree(ws->errlocs);
+	kfree(ws->c);
+	kfree(ws);
+	return NULL;
+}
+
+static void free_ws(struct wspace *ws)
+{
+	if (!ws)
+		return;
+
+	kfree(ws->errlocs);
+	kfree(ws->c);
+	kfree(ws);
+}
+
+/*
+ * Generates a random codeword and stores it in c. Generates random errors and
+ * erasures, and stores the random word with errors in r. Erasure positions are
+ * stored in derrlocs, while errlocs has one of three values in every position:
+ *
+ * 0 if there is no error in this position;
+ * 1 if there is a symbol error in this position;
+ * 2 if there is an erasure without symbol corruption.
+ *
+ * Returns the number of corrupted symbols.
+ */
+static int get_rcw_we(struct rs_control *rs, struct wspace *ws,
+			int len, int errs, int eras)
+{
+	int nn = rs->codec->nn;
+	int nroots = rs->codec->nroots;
+	uint16_t *c = ws->c;
+	uint16_t *r = ws->r;
+	int *errlocs = ws->errlocs;
+	int *derrlocs = ws->derrlocs;
+	int dlen = len - nroots;
+	int i;
+	int errval;
+	int errloc;
+
+	/* Load c with random data and encode */
+	for (i = 0; i < dlen; i++)
+		c[i] = prandom_u32() & nn;
+
+	memset(c + dlen, 0, nroots * sizeof(*c));
+	encode_rs16(rs, c, dlen, c + dlen, 0);
+
+	/* Make copyand add errors and erasures */
+	memcpy(r, c, len * sizeof(*r));
+	memset(errlocs, 0, len * sizeof(*errlocs));
+	memset(derrlocs, 0, nroots * sizeof(*derrlocs));
+
+	/* Generating random errors */
+	for (i = 0; i < errs; i++) {
+		do {
+			/* Error value must be nonzero */
+			errval = prandom_u32() & nn;
+		} while (errval == 0);
+
+		do {
+			/* Must not choose the same location twice */
+			errloc = prandom_u32() % len;
+		} while (errlocs[errloc] != 0);
+
+		errlocs[errloc] = 1;
+		r[errloc] ^= errval;
+	}
+
+	/* Generating random erasures */
+	for (i = 0; i < eras; i++) {
+		do {
+			/* Must not choose the same location twice */
+			errloc = prandom_u32() % len;
+		} while (errlocs[errloc] != 0);
+
+		derrlocs[i] = errloc;
+
+		if (ewsc && (prandom_u32() & 1)) {
+			/* Erasure with the symbol intact */
+			errlocs[errloc] = 2;
+		} else {
+			/* Erasure with corrupted symbol */
+			do {
+				/* Error value must be nonzero */
+				errval = prandom_u32() & nn;
+			} while (errval == 0);
+
+			errlocs[errloc] = 1;
+			r[errloc] ^= errval;
+			errs++;
+		}
+	}
+
+	return errs;
+}
+
+static void fix_err(uint16_t *data, int nerrs, uint16_t *corr, int *errlocs)
+{
+	int i;
+
+	for (i = 0; i < nerrs; i++)
+		data[errlocs[i]] ^= corr[i];
+}
+
+static void compute_syndrome(struct rs_control *rsc, uint16_t *data,
+				int len, uint16_t *syn)
+{
+	struct rs_codec *rs = rsc->codec;
+	int nroots = rs->nroots;
+	int fcr = rs->fcr;
+	int prim = rs->prim;
+	uint16_t *alpha_to = rs->alpha_to;
+	uint16_t *index_of = rs->index_of;
+	int i, j;
+
+	/* Calculating syndrome */
+	for (i = 0; i < nroots; i++) {
+		syn[i] = data[0];
+		for (j = 1; j < len; j++) {
+			if (syn[i] == 0) {
+				syn[i] = data[j];
+			} else {
+				syn[i] = data[j] ^
+					alpha_to[rs_modnn(rs, index_of[syn[i]]
+						+ (fcr + i) * prim)];
+			}
+		}
+	}
+
+	/* Convert to index form */
+	for (i = 0; i < nroots; i++)
+		syn[i] = rs->index_of[syn[i]];
+}
+
+/* Test up to error correction capacity */
+static void test_uc(struct rs_control *rs, int len, int errs,
+		int eras, int trials, struct estat *stat,
+		struct wspace *ws, int method)
+{
+	uint16_t *c = ws->c;
+	uint16_t *r = ws->r;
+	uint16_t *s = ws->s;
+	uint16_t *corr = ws->corr;
+	int *errlocs = ws->errlocs;
+	int *derrlocs = ws->derrlocs;
+	int nroots = rs->codec->nroots;
+	int dlen = len - nroots;
+	int derrs, nerrs, i, j;
+
+	for (j = 0; j < trials; j++) {
+		nerrs = get_rcw_we(rs, ws, len, errs, eras);
+
+		switch (method) {
+		case CORR_BUFFER:
+			derrs = decode_rs16(rs, r, r + dlen, dlen,
+					NULL, eras, derrlocs, 0, corr);
+			fix_err(r, derrs, corr, derrlocs);
+			break;
+		case CALLER_SYNDROME:
+			compute_syndrome(rs, r, len, s);
+			derrs = decode_rs16(rs, NULL, NULL, dlen,
+					s, eras, derrlocs, 0, corr);
+			fix_err(r, derrs, corr, derrlocs);
+			break;
+		case IN_PLACE:
+			derrs = decode_rs16(rs, r, r + dlen, dlen,
+					NULL, eras, derrlocs, 0, NULL);
+			break;
+		}
+
+		if (derrs != nerrs)
+			stat->irv++;
+
+		if (method != IN_PLACE) {
+			for (i = 0; i < derrs; i++) {
+				if (errlocs[derrlocs[i]] != 1)
+					stat->wepos++;
+			}
+		}
+
+		if (memcmp(r, c, len * sizeof(*r)))
+			stat->dwrong++;
+	}
+	stat->nwords += trials;
+}
+
+int ex_rs_helper(struct rs_control *rs, struct wspace *ws,
+		int len, int trials, int method)
+{
+	static const char * const desc[] = {
+		"Testing correction buffer interface...",
+		"Testing with caller provided syndrome...",
+		"Testing in-place interface..."
+	};
+
+	struct estat stat = {0, 0, 0, 0};
+	int nroots = rs->codec->nroots;
+	int errs, eras, retval;
+
+	if (v >= V_PROGRESS)
+		pr_info("  %s\n", desc[method]);
+
+	for (errs = 0; errs <= nroots / 2; errs++)
+		for (eras = 0; eras <= nroots - 2 * errs; eras++)
+			test_uc(rs, len, errs, eras, trials, &stat, ws, method);
+
+	if (v >= V_CSUMMARY) {
+		pr_info("    Decodes wrong:        %d / %d\n",
+				stat.dwrong, stat.nwords);
+		pr_info("    Wrong return value:   %d / %d\n",
+				stat.irv, stat.nwords);
+		if (method != IN_PLACE)
+			pr_info("    Wrong error position: %d\n", stat.wepos);
+	}
+
+	retval = stat.dwrong + stat.wepos + stat.irv;
+	if (retval && v >= V_PROGRESS)
+		pr_warn("    FAIL: %d decoding failures!\n", retval);
+
+	return retval;
+}
+
+int exercise_rs(struct rs_control *rs, struct wspace *ws,
+		int len, int trials)
+{
+
+	int retval = 0;
+	int i;
+
+	if (v >= V_PROGRESS)
+		pr_info("Testing up to error correction capacity...\n");
+
+	for (i = 0; i <= IN_PLACE; i++)
+		retval |= ex_rs_helper(rs, ws, len, trials, i);
+
+	return retval;
+}
+
+/* Tests for correct behaviour beyond error correction capacity */
+static void test_bc(struct rs_control *rs, int len, int errs,
+		int eras, int trials, struct bcstat *stat,
+		struct wspace *ws)
+{
+	uint16_t *r = ws->r;
+	uint16_t *corr = ws->corr;
+	int *derrlocs = ws->derrlocs;
+	int nroots = rs->codec->nroots;
+	int dlen = len - nroots;
+	int derrs, j;
+
+	for (j = 0; j < trials; j++) {
+		get_rcw_we(rs, ws, len, errs, eras);
+		derrs = decode_rs16(rs, r, r + dlen, dlen,
+				NULL, eras, derrlocs, 0, corr);
+		fix_err(r, derrs, corr, derrlocs);
+
+		if (derrs >= 0) {
+			stat->rsuccess++;
+
+			/*
+			 * We check that the returned word is actually a
+			 * codeword. The obious way to do this would be to
+			 * compute the syndrome, but we don't want to replicate
+			 * that code here. However, all the codes are in
+			 * systematic form, and therefore we can encode the
+			 * returned word, and see whether the parity changes or
+			 * not.
+			 */
+			memset(corr, 0, nroots * sizeof(*corr));
+			encode_rs16(rs, r, dlen, corr, 0);
+
+			if (memcmp(r + dlen, corr, nroots * sizeof(*corr)))
+				stat->noncw++;
+		} else {
+			stat->rfail++;
+		}
+	}
+	stat->nwords += trials;
+}
+
+int exercise_rs_bc(struct rs_control *rs, struct wspace *ws,
+		int len, int trials)
+{
+	struct bcstat stat = {0, 0, 0, 0};
+	int nroots = rs->codec->nroots;
+	int errs, eras, cutoff;
+
+	if (v >= V_PROGRESS)
+		pr_info("Testing beyond error correction capacity...\n");
+
+	for (errs = 1; errs <= nroots; errs++) {
+		eras = nroots - 2 * errs + 1;
+		if (eras < 0)
+			eras = 0;
+
+		cutoff = nroots <= len - errs ? nroots : len - errs;
+		for (; eras <= cutoff; eras++)
+			test_bc(rs, len, errs, eras, trials, &stat, ws);
+	}
+
+	if (v >= V_CSUMMARY) {
+		pr_info("  decoder gives up:        %d / %d\n",
+				stat.rfail, stat.nwords);
+		pr_info("  decoder returns success: %d / %d\n",
+				stat.rsuccess, stat.nwords);
+		pr_info("    not a codeword:        %d / %d\n",
+				stat.noncw, stat.rsuccess);
+	}
+
+	if (stat.noncw && v >= V_PROGRESS)
+		pr_warn("    FAIL: %d silent failures!\n", stat.noncw);
+
+	return stat.noncw;
+}
+
+static int run_exercise(struct etab *e)
+{
+	struct rs_control *rsc;
+	struct wspace *ws;
+	int nn = (1 << e->symsize) - 1;
+	int kk = nn - e->nroots;
+	int max_pad = kk - 1;
+	int prev_pad = -1;
+	int i;
+	int retval = -ENOMEM;
+
+	rsc = init_rs(e->symsize, e->genpoly, e->fcs, e->prim, e->nroots);
+	if (!rsc)
+		return retval;
+
+	ws = alloc_ws(rsc->codec);
+	if (!ws)
+		goto err;
+
+	retval = 0;
+	for (i = 0; i < ARRAY_SIZE(Pad); i++) {
+		int pad = Pad[i] * max_pad;
+		int len = nn - pad;
+
+		if (pad == prev_pad)
+			continue;
+
+		prev_pad = pad;
+		if (v >= V_PROGRESS) {
+			pr_info("Testing (%d,%d)_%d code...\n",
+					len, kk - pad, nn + 1);
+		}
+
+		retval |= exercise_rs(rsc, ws, len, e->ntrials);
+		if (bc)
+			retval |= exercise_rs_bc(rsc, ws, len, e->ntrials);
+	}
+
+	free_ws(ws);
+
+err:
+	free_rs(rsc);
+	return retval;
+}
+
+static int __init test_rslib_init(void)
+{
+	int fail, i;
+
+	for (i = 0; Tab[i].symsize != 0 ; i++) {
+		int retval;
+
+		retval = run_exercise(Tab + i);
+		if (retval < 0)
+			return -ENOMEM;
+
+		fail |= retval;
+	}
+
+	if (fail)
+		pr_warn("rslib: test failed\n");
+	else
+		pr_info("rslib: test ok\n");
+
+	return -EAGAIN; /* Fail will directly unload the module */
+}
+
+static void __exit test_rslib_exit(void)
+{
+}
+
+module_init(test_rslib_init)
+module_exit(test_rslib_exit)
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Ferdinand Blomqvist");
+MODULE_DESCRIPTION("Reed-Solomon library test");
-- 
2.17.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 2/7] rslib: Fix decoding of shortened codes
  2019-03-30 18:29 [RFC PATCH 0/7] rslib: RS decoder is severely broken Ferdinand Blomqvist
  2019-03-30 18:29 ` [RFC PATCH 1/7] rslib: Add tests for the encoder and decoder Ferdinand Blomqvist
@ 2019-03-30 18:29 ` Ferdinand Blomqvist
  2019-06-17 20:46   ` Thomas Gleixner
  2019-03-30 18:29 ` [RFC PATCH 3/7] rslib: decode_rs: Fix length parameter check Ferdinand Blomqvist
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Ferdinand Blomqvist @ 2019-03-30 18:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner

The decoder is broken. It only works with full length codes.

Signed-off-by: Ferdinand Blomqvist <ferdinand.blomqvist@gmail.com>
---
 lib/reed_solomon/decode_rs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/reed_solomon/decode_rs.c b/lib/reed_solomon/decode_rs.c
index 1db74eb098d0..3313bf944ff1 100644
--- a/lib/reed_solomon/decode_rs.c
+++ b/lib/reed_solomon/decode_rs.c
@@ -99,9 +99,9 @@
 	if (no_eras > 0) {
 		/* Init lambda to be the erasure locator polynomial */
 		lambda[1] = alpha_to[rs_modnn(rs,
-					      prim * (nn - 1 - eras_pos[0]))];
+					prim * (nn - 1 - (eras_pos[0] + pad)))];
 		for (i = 1; i < no_eras; i++) {
-			u = rs_modnn(rs, prim * (nn - 1 - eras_pos[i]));
+			u = rs_modnn(rs, prim * (nn - 1 - (eras_pos[i] + pad)));
 			for (j = i + 1; j > 0; j--) {
 				tmp = index_of[lambda[j - 1]];
 				if (tmp != nn) {
-- 
2.17.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 3/7] rslib: decode_rs: Fix length parameter check
  2019-03-30 18:29 [RFC PATCH 0/7] rslib: RS decoder is severely broken Ferdinand Blomqvist
  2019-03-30 18:29 ` [RFC PATCH 1/7] rslib: Add tests for the encoder and decoder Ferdinand Blomqvist
  2019-03-30 18:29 ` [RFC PATCH 2/7] rslib: Fix decoding of shortened codes Ferdinand Blomqvist
@ 2019-03-30 18:29 ` Ferdinand Blomqvist
  2019-06-17 20:46   ` Thomas Gleixner
  2019-03-30 18:29 ` [RFC PATCH 4/7] rslib: decode_rs: code cleanup Ferdinand Blomqvist
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Ferdinand Blomqvist @ 2019-03-30 18:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner

The length of the data load must be at least one. Or in other words,
there must be room for at least 1 data and nroots parity symbols after
shortening the RS code.

Signed-off-by: Ferdinand Blomqvist <ferdinand.blomqvist@gmail.com>
---
 lib/reed_solomon/decode_rs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/reed_solomon/decode_rs.c b/lib/reed_solomon/decode_rs.c
index 3313bf944ff1..22006eaa41e6 100644
--- a/lib/reed_solomon/decode_rs.c
+++ b/lib/reed_solomon/decode_rs.c
@@ -39,7 +39,7 @@
 
 	/* Check length parameter for validity */
 	pad = nn - nroots - len;
-	BUG_ON(pad < 0 || pad >= nn);
+	BUG_ON(pad < 0 || pad >= nn - nroots);
 
 	/* Does the caller provide the syndrome ? */
 	if (s != NULL)
-- 
2.17.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 4/7] rslib: decode_rs: code cleanup
  2019-03-30 18:29 [RFC PATCH 0/7] rslib: RS decoder is severely broken Ferdinand Blomqvist
                   ` (2 preceding siblings ...)
  2019-03-30 18:29 ` [RFC PATCH 3/7] rslib: decode_rs: Fix length parameter check Ferdinand Blomqvist
@ 2019-03-30 18:29 ` Ferdinand Blomqvist
  2019-06-17 20:47   ` Thomas Gleixner
  2019-03-30 18:29 ` [RFC PATCH 5/7] rslib: Fix handling of of caller provided syndrome Ferdinand Blomqvist
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Ferdinand Blomqvist @ 2019-03-30 18:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner

Nothing useful was done after the finish label when count is negative so
return directly instead of jumping to finish.

Signed-off-by: Ferdinand Blomqvist <ferdinand.blomqvist@gmail.com>
---
 lib/reed_solomon/decode_rs.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/lib/reed_solomon/decode_rs.c b/lib/reed_solomon/decode_rs.c
index 22006eaa41e6..78629bbe6590 100644
--- a/lib/reed_solomon/decode_rs.c
+++ b/lib/reed_solomon/decode_rs.c
@@ -88,8 +88,7 @@
 		/* if syndrome is zero, data[] is a codeword and there are no
 		 * errors to correct. So return data[] unmodified
 		 */
-		count = 0;
-		goto finish;
+		return 0;
 	}
 
  decode:
@@ -202,8 +201,7 @@
 		 * deg(lambda) unequal to number of roots => uncorrectable
 		 * error detected
 		 */
-		count = -EBADMSG;
-		goto finish;
+		return -EBADMSG;
 	}
 	/*
 	 * Compute err+eras evaluator poly omega(x) = s(x)*lambda(x) (modulo
@@ -261,7 +259,6 @@
 		}
 	}
 
-finish:
 	if (eras_pos != NULL) {
 		for (i = 0; i < count; i++)
 			eras_pos[i] = loc[i] - pad;
-- 
2.17.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 5/7] rslib: Fix handling of of caller provided syndrome
  2019-03-30 18:29 [RFC PATCH 0/7] rslib: RS decoder is severely broken Ferdinand Blomqvist
                   ` (3 preceding siblings ...)
  2019-03-30 18:29 ` [RFC PATCH 4/7] rslib: decode_rs: code cleanup Ferdinand Blomqvist
@ 2019-03-30 18:29 ` Ferdinand Blomqvist
  2019-06-17 20:49   ` Thomas Gleixner
  2019-03-30 18:29 ` [RFC PATCH 6/7] rslib: Update documentation Ferdinand Blomqvist
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Ferdinand Blomqvist @ 2019-03-30 18:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner

Check if the syndrome provided by the caller is zero, and act
accordingly.

Signed-off-by: Ferdinand Blomqvist <ferdinand.blomqvist@gmail.com>
---
 lib/reed_solomon/decode_rs.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/reed_solomon/decode_rs.c b/lib/reed_solomon/decode_rs.c
index 78629bbe6590..7ecc449e57e9 100644
--- a/lib/reed_solomon/decode_rs.c
+++ b/lib/reed_solomon/decode_rs.c
@@ -42,8 +42,16 @@
 	BUG_ON(pad < 0 || pad >= nn - nroots);
 
 	/* Does the caller provide the syndrome ? */
-	if (s != NULL)
-		goto decode;
+	if (s != NULL) {
+	    for (i = 0; i < nroots; i++) {
+		    /* The syndrome is in index form, so nn represents zero */
+		    if (s[i] != nn)
+			    goto decode;
+	    }
+
+	    /* syndrome is zero, no errors to correct  */
+	    return 0;
+	}
 
 	/* form the syndromes; i.e., evaluate data(x) at roots of
 	 * g(x) */
-- 
2.17.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 6/7] rslib: Update documentation
  2019-03-30 18:29 [RFC PATCH 0/7] rslib: RS decoder is severely broken Ferdinand Blomqvist
                   ` (4 preceding siblings ...)
  2019-03-30 18:29 ` [RFC PATCH 5/7] rslib: Fix handling of of caller provided syndrome Ferdinand Blomqvist
@ 2019-03-30 18:29 ` Ferdinand Blomqvist
  2019-06-17 20:49   ` Thomas Gleixner
  2019-03-30 18:29 ` [RFC PATCH 7/7] rslib: Fix remaining decoder flaws Ferdinand Blomqvist
  2019-04-04 11:24 ` [RFC PATCH 0/7] rslib: RS decoder is severely broken Thomas Gleixner
  7 siblings, 1 reply; 19+ messages in thread
From: Ferdinand Blomqvist @ 2019-03-30 18:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner

The decoder returns the number of corrected symbols, not bits.
The caller provided syndrome must be in index form.

Signed-off-by: Ferdinand Blomqvist <ferdinand.blomqvist@gmail.com>
---
 lib/reed_solomon/reed_solomon.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/lib/reed_solomon/reed_solomon.c b/lib/reed_solomon/reed_solomon.c
index e5fdc8b9e856..bbc01bad3053 100644
--- a/lib/reed_solomon/reed_solomon.c
+++ b/lib/reed_solomon/reed_solomon.c
@@ -340,7 +340,8 @@ EXPORT_SYMBOL_GPL(encode_rs8);
  *  @data:	data field of a given type
  *  @par:	received parity data field
  *  @len:	data length
- *  @s:		syndrome data field (if NULL, syndrome is calculated)
+ *  @s: 	syndrome data field, must be in index form
+ *		(if NULL, syndrome is calculated)
  *  @no_eras:	number of erasures
  *  @eras_pos:	position of erasures, can be NULL
  *  @invmsk:	invert data mask (will be xored on data, not on parity!)
@@ -354,7 +355,8 @@ EXPORT_SYMBOL_GPL(encode_rs8);
  *  decoding, so the caller has to ensure that decoder invocations are
  *  serialized.
  *
- *  Returns the number of corrected bits or -EBADMSG for uncorrectable errors.
+ *  Returns the number of corrected symbols or -EBADMSG for uncorrectable
+ *  errors. The count includes errors in the parity.
  */
 int decode_rs8(struct rs_control *rsc, uint8_t *data, uint16_t *par, int len,
 	       uint16_t *s, int no_eras, int *eras_pos, uint16_t invmsk,
@@ -391,7 +393,8 @@ EXPORT_SYMBOL_GPL(encode_rs16);
  *  @data:	data field of a given type
  *  @par:	received parity data field
  *  @len:	data length
- *  @s:		syndrome data field (if NULL, syndrome is calculated)
+ *  @s: 	syndrome data field, must be in index form
+ *		(if NULL, syndrome is calculated)
  *  @no_eras:	number of erasures
  *  @eras_pos:	position of erasures, can be NULL
  *  @invmsk:	invert data mask (will be xored on data, not on parity!)
@@ -403,7 +406,8 @@ EXPORT_SYMBOL_GPL(encode_rs16);
  *  decoding, so the caller has to ensure that decoder invocations are
  *  serialized.
  *
- *  Returns the number of corrected bits or -EBADMSG for uncorrectable errors.
+ *  Returns the number of corrected symbols or -EBADMSG for uncorrectable
+ *  errors. The count includes errors in the parity.
  */
 int decode_rs16(struct rs_control *rsc, uint16_t *data, uint16_t *par, int len,
 		uint16_t *s, int no_eras, int *eras_pos, uint16_t invmsk,
-- 
2.17.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 7/7] rslib: Fix remaining decoder flaws
  2019-03-30 18:29 [RFC PATCH 0/7] rslib: RS decoder is severely broken Ferdinand Blomqvist
                   ` (5 preceding siblings ...)
  2019-03-30 18:29 ` [RFC PATCH 6/7] rslib: Update documentation Ferdinand Blomqvist
@ 2019-03-30 18:29 ` Ferdinand Blomqvist
  2019-06-17 21:00   ` Thomas Gleixner
  2019-04-04 11:24 ` [RFC PATCH 0/7] rslib: RS decoder is severely broken Thomas Gleixner
  7 siblings, 1 reply; 19+ messages in thread
From: Ferdinand Blomqvist @ 2019-03-30 18:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner

The decoder is flawed in the following ways:

- The decoder sometimes fails silently, i.e. it announces success but
  returns a word that is not a codeword.

- The return value of the decoder is incoherent with respect to how
  fixed erasures are counted. If the word to be decoded is a codeword,
  then the decoder always returns zero even if some erasures are given.
  On the other hand, if the word to be decoded contains errors, then the
  number of erasures is always included in the count of corrected
  symbols. So the decoder handles erasures without symbol corruption
  inconsistently. This inconsistency probably doesn't affect anyone
  using the decoder, but it is inconsistent with the documentation.

- The error positions returned in eras_pos include all erasures, but the
  corrections are only set in the correction buffer if there actually is
  a symbol error. So if there are erasures without symbol corruption,
  then the correction buffer will contain errors (unless initialized to
  zero before calling the decoder) or some values will be unset (if the
  correction buffer is uninitialized).

- When correcting data in-place the decoder does not correct errors in
  the parity. On the other hand, when returning the errors in correction
  buffers, errors in the parity are included.

The respective fixed are:

- The syndrome of a codeword is always zero, and the syndrome is linear,
  .i.e, S(x+e) = S(x) + S(e). So compute the syndrome for the error and
  check whether it equals the syndrome of the received word. If it does,
  then we have decoded to a valid codeword, otherwise we know that we
  have an uncorrectable error. Fortunately, some unrecoverable error
  conditions can be detected earlier in the decoding, which saves some
  processing power.

- Simply count and return the number of symbols actually corrected.

- Make sure to only return positions where symbols were corrected.

- Also fix errors in parity when correcting in-place. Another option
  would be to completely disregard errors in the parity, but then the
  interface makes it impossible to write tests that test for silent
  failures.

Other changes:

- Only fill the correction buffer and error position buffer if both of
  them are provided. Otherwise correct in place. Previously the error
  position buffer was always populated with the positions of the
  corrected errors, irrespective of whether a correction buffer was
  supplied or not. The rationale for this change is that there seems to
  be two use cases for the decoder; correct in-place or use the
  correction buffers. The caller does not need the positions of the
  corrected errors when in-place correction is used. If in-place
  correction is not used, then both the correction buffer and error
  position buffer need to be populated.

Signed-off-by: Ferdinand Blomqvist <ferdinand.blomqvist@gmail.com>
---
 lib/reed_solomon/decode_rs.c | 87 +++++++++++++++++++++++++++---------
 1 file changed, 67 insertions(+), 20 deletions(-)

diff --git a/lib/reed_solomon/decode_rs.c b/lib/reed_solomon/decode_rs.c
index 7ecc449e57e9..33621ea67f67 100644
--- a/lib/reed_solomon/decode_rs.c
+++ b/lib/reed_solomon/decode_rs.c
@@ -22,6 +22,7 @@
 	uint16_t *index_of = rs->index_of;
 	uint16_t u, q, tmp, num1, num2, den, discr_r, syn_error;
 	int count = 0;
+	int num_corrected;
 	uint16_t msk = (uint16_t) rs->nn;
 
 	/*
@@ -182,6 +183,15 @@
 		if (lambda[i] != nn)
 			deg_lambda = i;
 	}
+
+	if (deg_lambda == 0) {
+		/*
+		 * deg(lambda) is zero even though the syndrome is non-zero
+		 * => uncorrectable error detected
+		 */
+		return -EBADMSG;
+	}
+
 	/* Find roots of error+erasure locator polynomial by Chien search */
 	memcpy(&reg[1], &lambda[1], nroots * sizeof(reg[0]));
 	count = 0;		/* Number of roots of lambda(x) */
@@ -195,6 +205,12 @@
 		}
 		if (q != 0)
 			continue;	/* Not a root */
+
+		if (k < pad) {
+			/* Impossible error location. Uncorrectable error. */
+			return -EBADMSG;
+		}
+
 		/* store root (index-form) and error location number */
 		root[count] = i;
 		loc[count] = k;
@@ -229,7 +245,9 @@
 	/*
 	 * Compute error values in poly-form. num1 = omega(inv(X(l))), num2 =
 	 * inv(X(l))**(fcr-1) and den = lambda_pr(inv(X(l))) all in poly-form
+	 * Note: we reuse the buffer for b to store the correction pattern
 	 */
+	num_corrected = 0;
 	for (j = count - 1; j >= 0; j--) {
 		num1 = 0;
 		for (i = deg_omega; i >= 0; i--) {
@@ -237,6 +255,12 @@
 				num1 ^= alpha_to[rs_modnn(rs, omega[i] +
 							i * root[j])];
 		}
+
+		if (num1 == 0) {
+			b[j] = 0;
+			continue;
+		}
+
 		num2 = alpha_to[rs_modnn(rs, root[j] * (fcr - 1) + nn)];
 		den = 0;
 
@@ -248,29 +272,52 @@
 						       i * root[j])];
 			}
 		}
-		/* Apply error to data */
-		if (num1 != 0 && loc[j] >= pad) {
-			uint16_t cor = alpha_to[rs_modnn(rs,index_of[num1] +
-						       index_of[num2] +
-						       nn - index_of[den])];
-			/* Store the error correction pattern, if a
-			 * correction buffer is available */
-			if (corr) {
-				corr[j] = cor;
-			} else {
-				/* If a data buffer is given and the
-				 * error is inside the message,
-				 * correct it */
-				if (data && (loc[j] < (nn - nroots)))
-					data[loc[j] - pad] ^= cor;
-			}
+
+		b[j] = alpha_to[rs_modnn(rs, index_of[num1] +
+					       index_of[num2] +
+					       nn - index_of[den])];
+		num_corrected++;
+	}
+
+	/*
+	 * We compute the syndrome of the 'error' to and check that it matches
+	 * the syndrome of the received word
+	 */
+	for (i = 0; i < nroots; i++) {
+		tmp = 0;
+		for (j = 0; j < count; j++) {
+			if (b[j] == 0)
+				continue;
+
+			k = (fcr + i) * prim * (nn-loc[j]-1);
+			tmp ^= alpha_to[rs_modnn(rs, index_of[b[j]] + k)];
 		}
+
+		if (tmp != alpha_to[s[i]])
+			return -EBADMSG;
 	}
 
-	if (eras_pos != NULL) {
-		for (i = 0; i < count; i++)
-			eras_pos[i] = loc[i] - pad;
+	/*
+	 * Store the error correction pattern, if a
+	 * correction buffer is available
+	 */
+	if (corr && eras_pos) {
+		j = 0;
+		for (i = 0; i < count; i++) {
+			if (b[i]) {
+				corr[j] = b[i];
+				eras_pos[j++] = loc[i] - pad;
+			}
+		}
+	} else if (data && par) {
+		/* Apply error to data and parity */
+		for (i = 0; i < count; i++) {
+			if (loc[i] < (nn - nroots))
+				data[loc[i] - pad] ^= b[i];
+			else
+				par[loc[i] - pad - len] ^= b[i];
+		}
 	}
-	return count;
 
+	return  num_corrected;
 }
-- 
2.17.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/7] rslib: RS decoder is severely broken
  2019-03-30 18:29 [RFC PATCH 0/7] rslib: RS decoder is severely broken Ferdinand Blomqvist
                   ` (6 preceding siblings ...)
  2019-03-30 18:29 ` [RFC PATCH 7/7] rslib: Fix remaining decoder flaws Ferdinand Blomqvist
@ 2019-04-04 11:24 ` Thomas Gleixner
  2019-05-09 15:09   ` Ferdinand Blomqvist
  7 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2019-04-04 11:24 UTC (permalink / raw)
  To: Ferdinand Blomqvist; +Cc: linux-kernel

Ferdinand,

On Sat, 30 Mar 2019, Ferdinand Blomqvist wrote:

Thanks for providing that! I'm short of cycles to go through that right
now, but will do in the foreseeable future. Feel free to remind me if I
don't do so within two weeks.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/7] rslib: RS decoder is severely broken
  2019-04-04 11:24 ` [RFC PATCH 0/7] rslib: RS decoder is severely broken Thomas Gleixner
@ 2019-05-09 15:09   ` Ferdinand Blomqvist
  2019-05-24  7:25     ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: Ferdinand Blomqvist @ 2019-05-09 15:09 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel

On 2019-04-04 13:24:05, Thomas Gleixner wrote:
>Ferdinand,
>
>On Sat, 30 Mar 2019, Ferdinand Blomqvist wrote:
>
>Thanks for providing that! I'm short of cycles to go through that right
>now, but will do in the foreseeable future. Feel free to remind me if I
>don't do so within two weeks.
>
>Thanks,
>
>	tglx
>

A gentle reminder.

-- 
Ferdinand Blomqvist
ferdinand.blomqvist[at]gmail.com
GPG key: 9EFB 7A2C 0432 4EC5 32BA FA61 CFE9 4164 93E8 B9E4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/7] rslib: RS decoder is severely broken
  2019-05-09 15:09   ` Ferdinand Blomqvist
@ 2019-05-24  7:25     ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2019-05-24  7:25 UTC (permalink / raw)
  To: Ferdinand Blomqvist; +Cc: linux-kernel

On Thu, 9 May 2019, Ferdinand Blomqvist wrote:
> On 2019-04-04 13:24:05, Thomas Gleixner wrote:
> > Ferdinand,
> > 
> > On Sat, 30 Mar 2019, Ferdinand Blomqvist wrote:
> > 
> > Thanks for providing that! I'm short of cycles to go through that right
> > now, but will do in the foreseeable future. Feel free to remind me if I
> > don't do so within two weeks.
> 
> A gentle reminder.

Working through it. Need to page in all the details :(

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 1/7] rslib: Add tests for the encoder and decoder
  2019-03-30 18:29 ` [RFC PATCH 1/7] rslib: Add tests for the encoder and decoder Ferdinand Blomqvist
@ 2019-06-17 20:44   ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2019-06-17 20:44 UTC (permalink / raw)
  To: Ferdinand Blomqvist; +Cc: linux-kernel

Ferdinand,

On Sat, 30 Mar 2019, Ferdinand Blomqvist wrote:

Sorry for the horrible long delay. I'm just drowned in backlog.

....

> There are a couple of options for the tests:
> 
> - Verbosity.
> 
> - Whether to test for correct behaviour beyond capacity. Default is to
>   test beyond capacity.
> 
> - Whether to allow erasures without symbol corruption. Defaults to yes.
> 
> Note that the tests take a couple of minutes to complete.

Very well written changelog!


> +/* List of codes to test */
> +static struct etab Tab[] = {
> +	{2,	0x7,	1,	1,	1,	100000	},
> +	{3,	0xb,	1,	1,	2,	100000	},
> +	{3,	0xb,	1,	1,	3,	100000	},
> +	{3,	0xb,	2,	1,	4,	100000	},
> +	{4,	0x13,	1,	1,	4,	10000	},
> +	{5,	0x25,	1,	1,	6,	1000	},
> +	{6,	0x43,	3,	1,	8,	100	},
> +	{7,	0x89,	1,	1,	10,	100	},
> +	{8,	0x11d,	1,	1,	32,	100	},
> +	{8,	0x187,	112,	11,	32,	100	},
> +	/*
> +	 * {9,	0x211,	1,	1,	32,	100	},
> +	 * {10,	0x409,	1,	1,	32,	100	},
> +	 * {11,	0x805,	1,	1,	32,	100	},
> +	 * {12,	0x1053	1,	1,	32,	50	},
> +	 * {12,	0x1053	1,	1,	64,	50	},
> +	 * {13,	0x201b	1,	1,	32,	20	},
> +	 * {13,	0x201b	1,	1,	64,	20	},
> +	 * {14,	0x4443	1,	1,	32,	10	},
> +	 * {14,	0x4443	1,	1,	64,	10	},
> +	 * {15,	0x8003	1,	1,	32,	5	},
> +	 * {15,	0x8003	1,	1,	64,	5	},
> +	 * {16,	0x1100	1,	1,	32,	5	},
> +	 */

I assume these are enabled later. We don't do that commented out stuff in
general. If it's used later, then add it with a separate patch. If not just
leave it alone.

> +	{0, 0, 0, 0, 0, 0},
> +};
> +
> +
> +struct estat {
> +	int dwrong;
> +	int irv;
> +	int wepos;
> +	int nwords;
> +};
> +
> +struct bcstat {
> +	int rfail;
> +	int rsuccess;
> +	int noncw;
> +	int nwords;
> +};
> +
> +struct wspace {
> +	uint16_t *c;		/* sent codeword */
> +	uint16_t *r;		/* received word */
> +	uint16_t *s;		/* syndrome */
> +	uint16_t *corr;		/* correction buffer */
> +	int *errlocs;
> +	int *derrlocs;

Pet pieve comment. I generally prefer tabular layout of structs as it's
simpler to follow

struct wspace {
	uint16_t	*c;		/* sent codeword */
	uint16_t	*r;		/* received word */
	uint16_t	*s;		/* syndrome */
	uint16_t	*corr;		/* correction buffer */
	int		*errlocs;
	int		*derrlocs;

Hmm?

> +
> +static double Pad[] = {0, 0.25, 0.5, 0.75, 1.0};

This is kernel code. You cannot use the FPU without special care. But for
that use case doing so would be actually overkill.

> +	for (i = 0; i < ARRAY_SIZE(Pad); i++) {
> +		int pad = Pad[i] * max_pad;

That can be simply expressed:

struct pad {
	int	mult;
	int	shift;
};

static struct pad pad[] = {
	{ 0, 0 },
	{ 1, 2 },
	{ 1, 1 },
	{ 3, 2 },
	{ 1, 0 },
};

	for (i = 0; i < ARRAY_SIZE(pad); i++) {
		int pad = (pad[i].mult * max_pad) >> pad[i].shift;

Also note, that I got rid of the CamelCase name.

> +static struct wspace *alloc_ws(struct rs_codec *rs)
> +{
> +	int nn = rs->nn;
> +	int nroots = rs->nroots;
> +	struct wspace *ws;

Yet another pet pieve comment for readability. Order variables in reverse
fir tree order.

	int nroots = rs->nroots;
	struct wspace *ws;
	int nn = rs->nn;

> +	ws = kzalloc(sizeof(*ws), GFP_KERNEL);
> +	if (!ws)
> +		return NULL;
> +
> +	ws->c = kmalloc_array(2 * (nn + nroots),
> +				sizeof(uint16_t), GFP_KERNEL);
> +	if (!ws->c)
> +		goto err;
> +
> +	ws->r = ws->c + nn;
> +	ws->s = ws->r + nn;
> +	ws->corr = ws->s + nroots;
> +
> +	ws->errlocs = kmalloc_array(nn + nroots, sizeof(int), GFP_KERNEL);
> +	if (!ws->c)
> +		goto err;
> +
> +	ws->derrlocs = ws->errlocs + nn;
> +	return ws;
> +
> +err:
> +	kfree(ws->errlocs);
> +	kfree(ws->c);
> +	kfree(ws);

If you move free_ws() above this function you can replace this kfree()
sequence with

	free_ws();

> +	return NULL;

Just nitpicks, except for the FPU issue. Other than that this looks great.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/7] rslib: Fix decoding of shortened codes
  2019-03-30 18:29 ` [RFC PATCH 2/7] rslib: Fix decoding of shortened codes Ferdinand Blomqvist
@ 2019-06-17 20:46   ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2019-06-17 20:46 UTC (permalink / raw)
  To: Ferdinand Blomqvist; +Cc: linux-kernel

Ferdinand,

On Sat, 30 Mar 2019, Ferdinand Blomqvist wrote:

> The decoder is broken. It only works with full length codes.

A short explanation what is missing and what is done to fix it would be
appreciated.

> Signed-off-by: Ferdinand Blomqvist <ferdinand.blomqvist@gmail.com>
> ---
>  lib/reed_solomon/decode_rs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/reed_solomon/decode_rs.c b/lib/reed_solomon/decode_rs.c
> index 1db74eb098d0..3313bf944ff1 100644
> --- a/lib/reed_solomon/decode_rs.c
> +++ b/lib/reed_solomon/decode_rs.c
> @@ -99,9 +99,9 @@
>  	if (no_eras > 0) {
>  		/* Init lambda to be the erasure locator polynomial */
>  		lambda[1] = alpha_to[rs_modnn(rs,
> -					      prim * (nn - 1 - eras_pos[0]))];
> +					prim * (nn - 1 - (eras_pos[0] + pad)))];
>  		for (i = 1; i < no_eras; i++) {
> -			u = rs_modnn(rs, prim * (nn - 1 - eras_pos[i]));
> +			u = rs_modnn(rs, prim * (nn - 1 - (eras_pos[i] + pad)));
>  			for (j = i + 1; j > 0; j--) {
>  				tmp = index_of[lambda[j - 1]];
>  				if (tmp != nn) {

Code looks fine as is.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/7] rslib: decode_rs: Fix length parameter check
  2019-03-30 18:29 ` [RFC PATCH 3/7] rslib: decode_rs: Fix length parameter check Ferdinand Blomqvist
@ 2019-06-17 20:46   ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2019-06-17 20:46 UTC (permalink / raw)
  To: Ferdinand Blomqvist; +Cc: linux-kernel

On Sat, 30 Mar 2019, Ferdinand Blomqvist wrote:

> The length of the data load must be at least one. Or in other words,
> there must be room for at least 1 data and nroots parity symbols after
> shortening the RS code.

Indeed!


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 4/7] rslib: decode_rs: code cleanup
  2019-03-30 18:29 ` [RFC PATCH 4/7] rslib: decode_rs: code cleanup Ferdinand Blomqvist
@ 2019-06-17 20:47   ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2019-06-17 20:47 UTC (permalink / raw)
  To: Ferdinand Blomqvist; +Cc: linux-kernel

On Sat, 30 Mar 2019, Ferdinand Blomqvist wrote:

> Nothing useful was done after the finish label when count is negative so
> return directly instead of jumping to finish.

Make sense.
 
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 5/7] rslib: Fix handling of of caller provided syndrome
  2019-03-30 18:29 ` [RFC PATCH 5/7] rslib: Fix handling of of caller provided syndrome Ferdinand Blomqvist
@ 2019-06-17 20:49   ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2019-06-17 20:49 UTC (permalink / raw)
  To: Ferdinand Blomqvist; +Cc: linux-kernel

On Sat, 30 Mar 2019, Ferdinand Blomqvist wrote:

> Check if the syndrome provided by the caller is zero, and act
> accordingly.

Makes sense. IIRC I did not add that as the only caller doing this back
then checked that at the call site. But yes for correctness sake this is a
good change.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 6/7] rslib: Update documentation
  2019-03-30 18:29 ` [RFC PATCH 6/7] rslib: Update documentation Ferdinand Blomqvist
@ 2019-06-17 20:49   ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2019-06-17 20:49 UTC (permalink / raw)
  To: Ferdinand Blomqvist; +Cc: linux-kernel

On Sat, 30 Mar 2019, Ferdinand Blomqvist wrote:

> The decoder returns the number of corrected symbols, not bits.
> The caller provided syndrome must be in index form.

Right.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 7/7] rslib: Fix remaining decoder flaws
  2019-03-30 18:29 ` [RFC PATCH 7/7] rslib: Fix remaining decoder flaws Ferdinand Blomqvist
@ 2019-06-17 21:00   ` Thomas Gleixner
  2019-06-20 12:09     ` Ferdinand Blomqvist
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2019-06-17 21:00 UTC (permalink / raw)
  To: Ferdinand Blomqvist; +Cc: linux-kernel

On Sat, 30 Mar 2019, Ferdinand Blomqvist wrote:
> The decoder is flawed in the following ways:

...

> - Also fix errors in parity when correcting in-place. Another option
>   would be to completely disregard errors in the parity, but then the
>   interface makes it impossible to write tests that test for silent
>   failures.
> 
> Other changes:
> 
> - Only fill the correction buffer and error position buffer if both of
>   them are provided. Otherwise correct in place. Previously the error
>   position buffer was always populated with the positions of the
>   corrected errors, irrespective of whether a correction buffer was
>   supplied or not. The rationale for this change is that there seems to
>   be two use cases for the decoder; correct in-place or use the
>   correction buffers. The caller does not need the positions of the
>   corrected errors when in-place correction is used. If in-place
>   correction is not used, then both the correction buffer and error
>   position buffer need to be populated.

Again. A perfect changelog! Very nice to read, informative and technically
on the point.

> diff --git a/lib/reed_solomon/decode_rs.c b/lib/reed_solomon/decode_rs.c
> index 7ecc449e57e9..33621ea67f67 100644
> --- a/lib/reed_solomon/decode_rs.c
> +++ b/lib/reed_solomon/decode_rs.c
> @@ -22,6 +22,7 @@
>  	uint16_t *index_of = rs->index_of;
>  	uint16_t u, q, tmp, num1, num2, den, discr_r, syn_error;
>  	int count = 0;
> +	int num_corrected;
>  	uint16_t msk = (uint16_t) rs->nn;
>  
>  	/*
> @@ -182,6 +183,15 @@
>  		if (lambda[i] != nn)
>  			deg_lambda = i;
>  	}
> +
> +	if (deg_lambda == 0) {
> +		/*
> +		 * deg(lambda) is zero even though the syndrome is non-zero
> +		 * => uncorrectable error detected
> +		 */
> +		return -EBADMSG;

Good catch. Now that I paged all that stuff back in it's obvious.

> +	}
> +
>  	/* Find roots of error+erasure locator polynomial by Chien search */
>  	memcpy(&reg[1], &lambda[1], nroots * sizeof(reg[0]));
>  	count = 0;		/* Number of roots of lambda(x) */
> @@ -195,6 +205,12 @@
>  		}
>  		if (q != 0)
>  			continue;	/* Not a root */
> +
> +		if (k < pad) {
> +			/* Impossible error location. Uncorrectable error. */
> +			return -EBADMSG;

True.

> +		}
> +
>  		/* store root (index-form) and error location number */
>  		root[count] = i;
>  		loc[count] = k;
> @@ -229,7 +245,9 @@
>  	/*
>  	 * Compute error values in poly-form. num1 = omega(inv(X(l))), num2 =
>  	 * inv(X(l))**(fcr-1) and den = lambda_pr(inv(X(l))) all in poly-form
> +	 * Note: we reuse the buffer for b to store the correction pattern
>  	 */
> +	num_corrected = 0;
>  	for (j = count - 1; j >= 0; j--) {
>  		num1 = 0;
>  		for (i = deg_omega; i >= 0; i--) {
> @@ -237,6 +255,12 @@
>  				num1 ^= alpha_to[rs_modnn(rs, omega[i] +
>  							i * root[j])];
>  		}
> +
> +		if (num1 == 0) {
> +			b[j] = 0;
> +			continue;

A comment for this would be appreciated.

> +		}
> +
>  		num2 = alpha_to[rs_modnn(rs, root[j] * (fcr - 1) + nn)];
>  		den = 0;
>  
> @@ -248,29 +272,52 @@
>  						       i * root[j])];
>  			}
>  		}
> -		/* Apply error to data */
> -		if (num1 != 0 && loc[j] >= pad) {
> -			uint16_t cor = alpha_to[rs_modnn(rs,index_of[num1] +
> -						       index_of[num2] +
> -						       nn - index_of[den])];
> -			/* Store the error correction pattern, if a
> -			 * correction buffer is available */
> -			if (corr) {
> -				corr[j] = cor;
> -			} else {
> -				/* If a data buffer is given and the
> -				 * error is inside the message,
> -				 * correct it */
> -				if (data && (loc[j] < (nn - nroots)))
> -					data[loc[j] - pad] ^= cor;
> -			}
> +
> +		b[j] = alpha_to[rs_modnn(rs, index_of[num1] +
> +					       index_of[num2] +
> +					       nn - index_of[den])];

Way simpler indeed.

> +		num_corrected++;
> +	}
> +
> +	/*
> +	 * We compute the syndrome of the 'error' to and check that it matches

s/to// or s/to/too ?

> +	 * the syndrome of the received word
> +	 */
> +	for (i = 0; i < nroots; i++) {
> +		tmp = 0;
> +		for (j = 0; j < count; j++) {
> +			if (b[j] == 0)
> +				continue;
> +
> +			k = (fcr + i) * prim * (nn-loc[j]-1);
> +			tmp ^= alpha_to[rs_modnn(rs, index_of[b[j]] + k)];
>  		}
> +
> +		if (tmp != alpha_to[s[i]])
> +			return -EBADMSG;

Interesting it never occured to me that this can actually happen.

>  	}
>  
> -	if (eras_pos != NULL) {
> -		for (i = 0; i < count; i++)
> -			eras_pos[i] = loc[i] - pad;
> +	/*
> +	 * Store the error correction pattern, if a
> +	 * correction buffer is available
> +	 */
> +	if (corr && eras_pos) {
> +		j = 0;
> +		for (i = 0; i < count; i++) {
> +			if (b[i]) {
> +				corr[j] = b[i];
> +				eras_pos[j++] = loc[i] - pad;
> +			}
> +		}
> +	} else if (data && par) {
> +		/* Apply error to data and parity */
> +		for (i = 0; i < count; i++) {
> +			if (loc[i] < (nn - nroots))
> +				data[loc[i] - pad] ^= b[i];
> +			else
> +				par[loc[i] - pad - len] ^= b[i];
> +		}

Aside of the fact that I had to wrap my brain around this crime I committed
more than a decade ago, all of this was really a pleasure to review.

Thanks a lot for putting that effort in! I'm looking forward to V2 of that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 7/7] rslib: Fix remaining decoder flaws
  2019-06-17 21:00   ` Thomas Gleixner
@ 2019-06-20 12:09     ` Ferdinand Blomqvist
  0 siblings, 0 replies; 19+ messages in thread
From: Ferdinand Blomqvist @ 2019-06-20 12:09 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel

Hi,

On 2019-06-17 23:00:27, Thomas Gleixner wrote:
>On Sat, 30 Mar 2019, Ferdinand Blomqvist wrote:
>> The decoder is flawed in the following ways:
>
>...
>
>Aside of the fact that I had to wrap my brain around this crime I committed
>more than a decade ago, all of this was really a pleasure to review.
>
>Thanks a lot for putting that effort in! I'm looking forward to V2 of that.
>
>Thanks,
>
>	tglx

Thanks for the review and comments! Will submit V2 shortly. Hopefully it 
addresses all concerns and suggestions in a satisfactory way.


Best,
Ferdinand

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-06-20 12:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-30 18:29 [RFC PATCH 0/7] rslib: RS decoder is severely broken Ferdinand Blomqvist
2019-03-30 18:29 ` [RFC PATCH 1/7] rslib: Add tests for the encoder and decoder Ferdinand Blomqvist
2019-06-17 20:44   ` Thomas Gleixner
2019-03-30 18:29 ` [RFC PATCH 2/7] rslib: Fix decoding of shortened codes Ferdinand Blomqvist
2019-06-17 20:46   ` Thomas Gleixner
2019-03-30 18:29 ` [RFC PATCH 3/7] rslib: decode_rs: Fix length parameter check Ferdinand Blomqvist
2019-06-17 20:46   ` Thomas Gleixner
2019-03-30 18:29 ` [RFC PATCH 4/7] rslib: decode_rs: code cleanup Ferdinand Blomqvist
2019-06-17 20:47   ` Thomas Gleixner
2019-03-30 18:29 ` [RFC PATCH 5/7] rslib: Fix handling of of caller provided syndrome Ferdinand Blomqvist
2019-06-17 20:49   ` Thomas Gleixner
2019-03-30 18:29 ` [RFC PATCH 6/7] rslib: Update documentation Ferdinand Blomqvist
2019-06-17 20:49   ` Thomas Gleixner
2019-03-30 18:29 ` [RFC PATCH 7/7] rslib: Fix remaining decoder flaws Ferdinand Blomqvist
2019-06-17 21:00   ` Thomas Gleixner
2019-06-20 12:09     ` Ferdinand Blomqvist
2019-04-04 11:24 ` [RFC PATCH 0/7] rslib: RS decoder is severely broken Thomas Gleixner
2019-05-09 15:09   ` Ferdinand Blomqvist
2019-05-24  7:25     ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).