LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [GIT PULL 00/20] lightnvm updates for 4.18
@ 2018-05-28  8:58 Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 01/20] lightnvm: pblk: fail gracefully on line alloc. failure Matias Bjørling
                   ` (20 more replies)
  0 siblings, 21 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Matias Bjørling

Hi Jens,

Please pick up the following patches.

 - Hans reworked the write error recovery path in pblk.
 - Igor added extra error handling for lines, and fixed a bug in the
   pblk ringbuffer during GC.
 - Javier refactored the pblk code a bit, added extra error
   handling, and added checks to verify that data returned from drive
   is appropriate.
 - Marcin added some extra logic to manage the write buffer. Now
   MW_CUNITS can be zero and the size of write buffer can be changed
   at module load time.

Thanks,
Matias

Hans Holmberg (3):
  lightnvm: pblk: rework write error recovery path
  lightnvm: pblk: garbage collect lines with failed writes
  lightnvm: pblk: fix smeta write error path

Igor Konopko (4):
  lightnvm: proper error handling for pblk_bio_add_pages
  lightnvm: error handling when whole line is bad
  lightnvm: fix partial read error path
  lightnvm: pblk: sync RB and RL states during GC

Javier González (11):
  lightnvm: pblk: fail gracefully on line alloc. failure
  lightnvm: pblk: recheck for bad lines at runtime
  lightnvm: pblk: check read lba on gc path
  lightnvn: pblk: improve error msg on corrupted LBAs
  lightnvm: pblk: warn in case of corrupted write buffer
  lightnvm: pblk: return NVM_ error on failed submission
  lightnvm: pblk: remove unnecessary indirection
  lightnvm: pblk: remove unnecessary argument
  lightnvm: pblk: check for chunk size before allocating it
  lightnvn: pass flag on graceful teardown to targets
  lightnvm: pblk: remove dead function

Marcin Dziegielewski (2):
  lightnvm: pblk: handle case when mw_cunits equals to 0
  lightnvm: pblk: add possibility to set write buffer size manually

 drivers/lightnvm/core.c          |  10 +-
 drivers/lightnvm/pblk-core.c     | 149 +++++++++++++++------
 drivers/lightnvm/pblk-gc.c       | 112 ++++++++++------
 drivers/lightnvm/pblk-init.c     | 112 +++++++++++-----
 drivers/lightnvm/pblk-map.c      |  33 +++--
 drivers/lightnvm/pblk-rb.c       |  51 +------
 drivers/lightnvm/pblk-read.c     |  83 +++++++++---
 drivers/lightnvm/pblk-recovery.c |  91 -------------
 drivers/lightnvm/pblk-rl.c       |  29 +++-
 drivers/lightnvm/pblk-sysfs.c    |  15 ++-
 drivers/lightnvm/pblk-write.c    | 281 +++++++++++++++++++++++++--------------
 drivers/lightnvm/pblk.h          |  43 +++---
 drivers/nvme/host/lightnvm.c     |   1 -
 include/linux/lightnvm.h         |   2 +-
 14 files changed, 604 insertions(+), 408 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 01/20] lightnvm: pblk: fail gracefully on line alloc. failure
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 02/20] lightnvm: pblk: recheck for bad lines at runtime Matias Bjørling
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

In the event of a line failing to allocate, fail gracefully and stop the
pipeline to avoid more write failing in the same place.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-init.c |  5 +++++
 drivers/lightnvm/pblk-map.c  | 33 ++++++++++++++++++++++++---------
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 91a5bc2556a3..dee64f91227d 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1047,6 +1047,11 @@ static int pblk_lines_init(struct pblk *pblk)
 		nr_free_chks += pblk_setup_line_meta(pblk, line, chunk_meta, i);
 	}
 
+	if (!nr_free_chks) {
+		pr_err("pblk: too many bad blocks prevent for sane instance\n");
+		return -EINTR;
+	}
+
 	pblk_set_provision(pblk, nr_free_chks);
 
 	kfree(chunk_meta);
diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 20dbaa89c9df..953ca31dda68 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -18,11 +18,11 @@
 
 #include "pblk.h"
 
-static void pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
-			       struct ppa_addr *ppa_list,
-			       unsigned long *lun_bitmap,
-			       struct pblk_sec_meta *meta_list,
-			       unsigned int valid_secs)
+static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
+			      struct ppa_addr *ppa_list,
+			      unsigned long *lun_bitmap,
+			      struct pblk_sec_meta *meta_list,
+			      unsigned int valid_secs)
 {
 	struct pblk_line *line = pblk_line_get_data(pblk);
 	struct pblk_emeta *emeta;
@@ -35,8 +35,14 @@ static void pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
 	if (pblk_line_is_full(line)) {
 		struct pblk_line *prev_line = line;
 
+		/* If we cannot allocate a new line, make sure to store metadata
+		 * on current line and then fail
+		 */
 		line = pblk_line_replace_data(pblk);
 		pblk_line_close_meta(pblk, prev_line);
+
+		if (!line)
+			return -EINTR;
 	}
 
 	emeta = line->emeta;
@@ -74,6 +80,7 @@ static void pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
 	}
 
 	pblk_down_rq(pblk, ppa_list, nr_secs, lun_bitmap);
+	return 0;
 }
 
 void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
@@ -87,8 +94,12 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
 
 	for (i = off; i < rqd->nr_ppas; i += min) {
 		map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
-		pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
-					lun_bitmap, &meta_list[i], map_secs);
+		if (pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
+					lun_bitmap, &meta_list[i], map_secs)) {
+			bio_put(rqd->bio);
+			pblk_free_rqd(pblk, rqd, PBLK_WRITE);
+			pblk_pipeline_stop(pblk);
+		}
 	}
 }
 
@@ -108,8 +119,12 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
 
 	for (i = 0; i < rqd->nr_ppas; i += min) {
 		map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
-		pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
-					lun_bitmap, &meta_list[i], map_secs);
+		if (pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
+					lun_bitmap, &meta_list[i], map_secs)) {
+			bio_put(rqd->bio);
+			pblk_free_rqd(pblk, rqd, PBLK_WRITE);
+			pblk_pipeline_stop(pblk);
+		}
 
 		erase_lun = pblk_ppa_to_pos(geo, rqd->ppa_list[i]);
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 02/20] lightnvm: pblk: recheck for bad lines at runtime
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 01/20] lightnvm: pblk: fail gracefully on line alloc. failure Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 03/20] lightnvm: pblk: check read lba on gc path Matias Bjørling
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

Bad blocks can grow at runtime. Check that the number of valid blocks in
a line are within the sanity threshold before allocating the line for
new writes.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 38 ++++++++++++++++++++++++++++----------
 drivers/lightnvm/pblk-init.c | 11 +++++++----
 2 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 94d5d97c9d8a..2cad918434a7 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1174,7 +1174,8 @@ static int pblk_prepare_new_line(struct pblk *pblk, struct pblk_line *line)
 static int pblk_line_prepare(struct pblk *pblk, struct pblk_line *line)
 {
 	struct pblk_line_meta *lm = &pblk->lm;
-	int blk_to_erase;
+	int blk_in_line = atomic_read(&line->blk_in_line);
+	int blk_to_erase, ret;
 
 	line->map_bitmap = kzalloc(lm->sec_bitmap_len, GFP_ATOMIC);
 	if (!line->map_bitmap)
@@ -1183,8 +1184,8 @@ static int pblk_line_prepare(struct pblk *pblk, struct pblk_line *line)
 	/* will be initialized using bb info from map_bitmap */
 	line->invalid_bitmap = kmalloc(lm->sec_bitmap_len, GFP_ATOMIC);
 	if (!line->invalid_bitmap) {
-		kfree(line->map_bitmap);
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto fail_free_map_bitmap;
 	}
 
 	/* Bad blocks do not need to be erased */
@@ -1199,16 +1200,19 @@ static int pblk_line_prepare(struct pblk *pblk, struct pblk_line *line)
 		blk_to_erase = pblk_prepare_new_line(pblk, line);
 		line->state = PBLK_LINESTATE_FREE;
 	} else {
-		blk_to_erase = atomic_read(&line->blk_in_line);
+		blk_to_erase = blk_in_line;
+	}
+
+	if (blk_in_line < lm->min_blk_line) {
+		ret = -EAGAIN;
+		goto fail_free_invalid_bitmap;
 	}
 
 	if (line->state != PBLK_LINESTATE_FREE) {
-		kfree(line->map_bitmap);
-		kfree(line->invalid_bitmap);
-		spin_unlock(&line->lock);
 		WARN(1, "pblk: corrupted line %d, state %d\n",
 							line->id, line->state);
-		return -EAGAIN;
+		ret = -EINTR;
+		goto fail_free_invalid_bitmap;
 	}
 
 	line->state = PBLK_LINESTATE_OPEN;
@@ -1222,6 +1226,16 @@ static int pblk_line_prepare(struct pblk *pblk, struct pblk_line *line)
 	kref_init(&line->ref);
 
 	return 0;
+
+fail_free_invalid_bitmap:
+	spin_unlock(&line->lock);
+	kfree(line->invalid_bitmap);
+	line->invalid_bitmap = NULL;
+fail_free_map_bitmap:
+	kfree(line->map_bitmap);
+	line->map_bitmap = NULL;
+
+	return ret;
 }
 
 int pblk_line_recov_alloc(struct pblk *pblk, struct pblk_line *line)
@@ -1292,10 +1306,14 @@ struct pblk_line *pblk_line_get(struct pblk *pblk)
 
 	ret = pblk_line_prepare(pblk, line);
 	if (ret) {
-		if (ret == -EAGAIN) {
+		switch (ret) {
+		case -EAGAIN:
+			list_add(&line->list, &l_mg->bad_list);
+			goto retry;
+		case -EINTR:
 			list_add(&line->list, &l_mg->corrupt_list);
 			goto retry;
-		} else {
+		default:
 			pr_err("pblk: failed to prepare line %d\n", line->id);
 			list_add(&line->list, &l_mg->free_list);
 			l_mg->nr_free_lines++;
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index dee64f91227d..8f8c9abd14fc 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -127,10 +127,8 @@ static int pblk_l2p_recover(struct pblk *pblk, bool factory_init)
 	if (!line) {
 		/* Configure next line for user data */
 		line = pblk_line_get_first_data(pblk);
-		if (!line) {
-			pr_err("pblk: line list corrupted\n");
+		if (!line)
 			return -EFAULT;
-		}
 	}
 
 	return 0;
@@ -141,6 +139,7 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
 	sector_t i;
 	struct ppa_addr ppa;
 	size_t map_size;
+	int ret = 0;
 
 	map_size = pblk_trans_map_size(pblk);
 	pblk->trans_map = vmalloc(map_size);
@@ -152,7 +151,11 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
 	for (i = 0; i < pblk->rl.nr_secs; i++)
 		pblk_trans_map_set(pblk, i, ppa);
 
-	return pblk_l2p_recover(pblk, factory_init);
+	ret = pblk_l2p_recover(pblk, factory_init);
+	if (ret)
+		vfree(pblk->trans_map);
+
+	return ret;
 }
 
 static void pblk_rwb_free(struct pblk *pblk)
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 03/20] lightnvm: pblk: check read lba on gc path
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 01/20] lightnvm: pblk: fail gracefully on line alloc. failure Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 02/20] lightnvm: pblk: recheck for bad lines at runtime Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 04/20] lightnvm: pblk: improve error msg on corrupted LBAs Matias Bjørling
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

Check that the lba stored in the LBA metadata is correct in the GC path
too. This requires a new helper function to check random reads in the
vector read.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-read.c | 39 +++++++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 9eee10f69df0..1f699c09e0ea 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -113,15 +113,14 @@ static int pblk_submit_read_io(struct pblk *pblk, struct nvm_rq *rqd)
 	return NVM_IO_OK;
 }
 
-static void pblk_read_check(struct pblk *pblk, struct nvm_rq *rqd,
-			   sector_t blba)
+static void pblk_read_check_seq(struct pblk *pblk, void *meta_list,
+				sector_t blba, int nr_lbas)
 {
-	struct pblk_sec_meta *meta_list = rqd->meta_list;
-	int nr_lbas = rqd->nr_ppas;
+	struct pblk_sec_meta *meta_lba_list = meta_list;
 	int i;
 
 	for (i = 0; i < nr_lbas; i++) {
-		u64 lba = le64_to_cpu(meta_list[i].lba);
+		u64 lba = le64_to_cpu(meta_lba_list[i].lba);
 
 		if (lba == ADDR_EMPTY)
 			continue;
@@ -130,6 +129,32 @@ static void pblk_read_check(struct pblk *pblk, struct nvm_rq *rqd,
 	}
 }
 
+/*
+ * There can be holes in the lba list.
+ */
+static void pblk_read_check_rand(struct pblk *pblk, void *meta_list,
+				u64 *lba_list, int nr_lbas)
+{
+	struct pblk_sec_meta *meta_lba_list = meta_list;
+	int i, j;
+
+	for (i = 0, j = 0; i < nr_lbas; i++) {
+		u64 lba = lba_list[i];
+		u64 meta_lba;
+
+		if (lba == ADDR_EMPTY)
+			continue;
+
+		meta_lba = le64_to_cpu(meta_lba_list[j++].lba);
+
+		if (lba != meta_lba) {
+			pr_err("pblk: corrupted read LBA (%llu/%llu)\n",
+								lba, meta_lba);
+			WARN_ON(1);
+		}
+	}
+}
+
 static void pblk_read_put_rqd_kref(struct pblk *pblk, struct nvm_rq *rqd)
 {
 	struct ppa_addr *ppa_list;
@@ -172,7 +197,7 @@ static void __pblk_end_io_read(struct pblk *pblk, struct nvm_rq *rqd,
 		WARN_ONCE(bio->bi_status, "pblk: corrupted read error\n");
 #endif
 
-	pblk_read_check(pblk, rqd, r_ctx->lba);
+	pblk_read_check_seq(pblk, rqd->meta_list, r_ctx->lba, rqd->nr_ppas);
 
 	bio_put(bio);
 	if (r_ctx->private)
@@ -585,6 +610,8 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
 		goto err_free_bio;
 	}
 
+	pblk_read_check_rand(pblk, rqd.meta_list, gc_rq->lba_list, rqd.nr_ppas);
+
 	atomic_dec(&pblk->inflight_io);
 
 	if (rqd.error) {
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 04/20] lightnvm: pblk: improve error msg on corrupted LBAs
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (2 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 03/20] lightnvm: pblk: check read lba on gc path Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 05/20] lightnvm: pblk: warn in case of corrupted write buffer Matias Bjørling
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

In the event of a mismatch between the read LBA and the metadata pointer
reported by the device, improve the error message to be able to detect
the offending physical address (PPA) mapped to the corrupted LBA.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-read.c | 42 ++++++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 1f699c09e0ea..b201fc486adb 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -113,10 +113,11 @@ static int pblk_submit_read_io(struct pblk *pblk, struct nvm_rq *rqd)
 	return NVM_IO_OK;
 }
 
-static void pblk_read_check_seq(struct pblk *pblk, void *meta_list,
-				sector_t blba, int nr_lbas)
+static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
+				sector_t blba)
 {
-	struct pblk_sec_meta *meta_lba_list = meta_list;
+	struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
+	int nr_lbas = rqd->nr_ppas;
 	int i;
 
 	for (i = 0; i < nr_lbas; i++) {
@@ -125,17 +126,27 @@ static void pblk_read_check_seq(struct pblk *pblk, void *meta_list,
 		if (lba == ADDR_EMPTY)
 			continue;
 
-		WARN(lba != blba + i, "pblk: corrupted read LBA\n");
+		if (lba != blba + i) {
+#ifdef CONFIG_NVM_DEBUG
+			struct ppa_addr *p;
+
+			p = (nr_lbas == 1) ? &rqd->ppa_list[i] : &rqd->ppa_addr;
+			print_ppa(&pblk->dev->geo, p, "seq", i);
+#endif
+			pr_err("pblk: corrupted read LBA (%llu/%llu)\n",
+							lba, (u64)blba + i);
+			WARN_ON(1);
+		}
 	}
 }
 
 /*
  * There can be holes in the lba list.
  */
-static void pblk_read_check_rand(struct pblk *pblk, void *meta_list,
-				u64 *lba_list, int nr_lbas)
+static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
+				 u64 *lba_list, int nr_lbas)
 {
-	struct pblk_sec_meta *meta_lba_list = meta_list;
+	struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
 	int i, j;
 
 	for (i = 0, j = 0; i < nr_lbas; i++) {
@@ -145,14 +156,25 @@ static void pblk_read_check_rand(struct pblk *pblk, void *meta_list,
 		if (lba == ADDR_EMPTY)
 			continue;
 
-		meta_lba = le64_to_cpu(meta_lba_list[j++].lba);
+		meta_lba = le64_to_cpu(meta_lba_list[j].lba);
 
 		if (lba != meta_lba) {
+#ifdef CONFIG_NVM_DEBUG
+			struct ppa_addr *p;
+			int nr_ppas = rqd->nr_ppas;
+
+			p = (nr_ppas == 1) ? &rqd->ppa_list[j] : &rqd->ppa_addr;
+			print_ppa(&pblk->dev->geo, p, "seq", j);
+#endif
 			pr_err("pblk: corrupted read LBA (%llu/%llu)\n",
 								lba, meta_lba);
 			WARN_ON(1);
 		}
+
+		j++;
 	}
+
+	WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
 }
 
 static void pblk_read_put_rqd_kref(struct pblk *pblk, struct nvm_rq *rqd)
@@ -197,7 +219,7 @@ static void __pblk_end_io_read(struct pblk *pblk, struct nvm_rq *rqd,
 		WARN_ONCE(bio->bi_status, "pblk: corrupted read error\n");
 #endif
 
-	pblk_read_check_seq(pblk, rqd->meta_list, r_ctx->lba, rqd->nr_ppas);
+	pblk_read_check_seq(pblk, rqd, r_ctx->lba);
 
 	bio_put(bio);
 	if (r_ctx->private)
@@ -610,7 +632,7 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
 		goto err_free_bio;
 	}
 
-	pblk_read_check_rand(pblk, rqd.meta_list, gc_rq->lba_list, rqd.nr_ppas);
+	pblk_read_check_rand(pblk, &rqd, gc_rq->lba_list, gc_rq->nr_secs);
 
 	atomic_dec(&pblk->inflight_io);
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 05/20] lightnvm: pblk: warn in case of corrupted write buffer
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (3 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 04/20] lightnvm: pblk: improve error msg on corrupted LBAs Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 06/20] lightnvm: pblk: return NVM_ error on failed submission Matias Bjørling
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

When cleaning up buffer entries as we wrap up, their state should be
"completed". If any of the entries is in "submitted" state, it means
that something bad has happened. Trigger a warning immediately instead of
waiting for the state flag to eventually be updated, thus hiding the
issue.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-rb.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index 52fdd85dbc97..58946ffebe81 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -142,10 +142,9 @@ static void clean_wctx(struct pblk_w_ctx *w_ctx)
 {
 	int flags;
 
-try:
 	flags = READ_ONCE(w_ctx->flags);
-	if (!(flags & PBLK_SUBMITTED_ENTRY))
-		goto try;
+	WARN_ONCE(!(flags & PBLK_SUBMITTED_ENTRY),
+			"pblk: overwriting unsubmitted data\n");
 
 	/* Release flags on context. Protect from writes and reads */
 	smp_store_release(&w_ctx->flags, PBLK_WRITABLE_ENTRY);
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 06/20] lightnvm: pblk: return NVM_ error on failed submission
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (4 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 05/20] lightnvm: pblk: warn in case of corrupted write buffer Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 07/20] lightnvm: pblk: remove unnecessary indirection Matias Bjørling
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

Return a meaningful error when the sanity vector I/O check fails.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 2cad918434a7..0d4078805ecc 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -467,16 +467,13 @@ int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd)
 {
 	struct nvm_tgt_dev *dev = pblk->dev;
 
+	atomic_inc(&pblk->inflight_io);
+
 #ifdef CONFIG_NVM_DEBUG
-	int ret;
-
-	ret = pblk_check_io(pblk, rqd);
-	if (ret)
-		return ret;
+	if (pblk_check_io(pblk, rqd))
+		return NVM_IO_ERR;
 #endif
 
-	atomic_inc(&pblk->inflight_io);
-
 	return nvm_submit_io(dev, rqd);
 }
 
@@ -484,16 +481,13 @@ int pblk_submit_io_sync(struct pblk *pblk, struct nvm_rq *rqd)
 {
 	struct nvm_tgt_dev *dev = pblk->dev;
 
+	atomic_inc(&pblk->inflight_io);
+
 #ifdef CONFIG_NVM_DEBUG
-	int ret;
-
-	ret = pblk_check_io(pblk, rqd);
-	if (ret)
-		return ret;
+	if (pblk_check_io(pblk, rqd))
+		return NVM_IO_ERR;
 #endif
 
-	atomic_inc(&pblk->inflight_io);
-
 	return nvm_submit_io_sync(dev, rqd);
 }
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 07/20] lightnvm: pblk: remove unnecessary indirection
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (5 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 06/20] lightnvm: pblk: return NVM_ error on failed submission Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 08/20] lightnvm: pblk: remove unnecessary argument Matias Bjørling
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

Call nvm_submit_io directly and remove an unnecessary indirection on the
read path.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-read.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index b201fc486adb..a2e678de428f 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -102,16 +102,6 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
 #endif
 }
 
-static int pblk_submit_read_io(struct pblk *pblk, struct nvm_rq *rqd)
-{
-	int err;
-
-	err = pblk_submit_io(pblk, rqd);
-	if (err)
-		return NVM_IO_ERR;
-
-	return NVM_IO_OK;
-}
 
 static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
 				sector_t blba)
@@ -485,9 +475,9 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 		rqd->bio = int_bio;
 		r_ctx->private = bio;
 
-		ret = pblk_submit_read_io(pblk, rqd);
-		if (ret) {
+		if (pblk_submit_io(pblk, rqd)) {
 			pr_err("pblk: read IO submission failed\n");
+			ret = NVM_IO_ERR;
 			if (int_bio)
 				bio_put(int_bio);
 			goto fail_end_io;
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 08/20] lightnvm: pblk: remove unnecessary argument
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (6 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 07/20] lightnvm: pblk: remove unnecessary indirection Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 09/20] lightnvm: pblk: check for chunk size before allocating it Matias Bjørling
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

Remove unnecessary argument on pblk_line_free()

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 6 +++---
 drivers/lightnvm/pblk-init.c | 2 +-
 drivers/lightnvm/pblk.h      | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 0d4078805ecc..4b10122aec89 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1337,7 +1337,7 @@ static struct pblk_line *pblk_line_retry(struct pblk *pblk,
 	retry_line->emeta = line->emeta;
 	retry_line->meta_line = line->meta_line;
 
-	pblk_line_free(pblk, line);
+	pblk_line_free(line);
 	l_mg->data_line = retry_line;
 	spin_unlock(&l_mg->free_lock);
 
@@ -1562,7 +1562,7 @@ struct pblk_line *pblk_line_replace_data(struct pblk *pblk)
 	return new;
 }
 
-void pblk_line_free(struct pblk *pblk, struct pblk_line *line)
+void pblk_line_free(struct pblk_line *line)
 {
 	kfree(line->map_bitmap);
 	kfree(line->invalid_bitmap);
@@ -1584,7 +1584,7 @@ static void __pblk_line_put(struct pblk *pblk, struct pblk_line *line)
 	WARN_ON(line->state != PBLK_LINESTATE_GC);
 	line->state = PBLK_LINESTATE_FREE;
 	line->gc_group = PBLK_LINEGC_NONE;
-	pblk_line_free(pblk, line);
+	pblk_line_free(line);
 	spin_unlock(&line->lock);
 
 	atomic_dec(&gc->pipeline_gc);
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 8f8c9abd14fc..b52855f9336b 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -509,7 +509,7 @@ static void pblk_lines_free(struct pblk *pblk)
 	for (i = 0; i < l_mg->nr_lines; i++) {
 		line = &pblk->lines[i];
 
-		pblk_line_free(pblk, line);
+		pblk_line_free(line);
 		pblk_line_meta_free(line);
 	}
 	spin_unlock(&l_mg->free_lock);
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 9c682acfc5d1..dfbfe9e9a385 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -766,7 +766,7 @@ struct pblk_line *pblk_line_get_data(struct pblk *pblk);
 struct pblk_line *pblk_line_get_erase(struct pblk *pblk);
 int pblk_line_erase(struct pblk *pblk, struct pblk_line *line);
 int pblk_line_is_full(struct pblk_line *line);
-void pblk_line_free(struct pblk *pblk, struct pblk_line *line);
+void pblk_line_free(struct pblk_line *line);
 void pblk_line_close_meta(struct pblk *pblk, struct pblk_line *line);
 void pblk_line_close(struct pblk *pblk, struct pblk_line *line);
 void pblk_line_close_ws(struct work_struct *work);
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 09/20] lightnvm: pblk: check for chunk size before allocating it
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (7 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 08/20] lightnvm: pblk: remove unnecessary argument Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 10/20] lightnvm: pass flag on graceful teardown to targets Matias Bjørling
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

Do the check for the chunk state after making sure that the chunk type
is supported.

Fixes: 32ef9412c114 ("lightnvm: pblk: implement get log report chunk")
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-init.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index b52855f9336b..9e3a43346d4c 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -751,14 +751,14 @@ static int pblk_setup_line_meta_20(struct pblk *pblk, struct pblk_line *line,
 		chunk->cnlb = chunk_meta->cnlb;
 		chunk->wp = chunk_meta->wp;
 
-		if (!(chunk->state & NVM_CHK_ST_OFFLINE))
-			continue;
-
 		if (chunk->type & NVM_CHK_TP_SZ_SPEC) {
 			WARN_ONCE(1, "pblk: custom-sized chunks unsupported\n");
 			continue;
 		}
 
+		if (!(chunk->state & NVM_CHK_ST_OFFLINE))
+			continue;
+
 		set_bit(pos, line->blk_bitmap);
 		nr_bad_chks++;
 	}
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 10/20] lightnvm: pass flag on graceful teardown to targets
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (8 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 09/20] lightnvm: pblk: check for chunk size before allocating it Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 11/20] lightnvm: pblk: remove dead function Matias Bjørling
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

If the namespace is unregistered before the LightNVM target is removed
(e.g., on hot unplug) it is too late for the target to store any metadata
on the device - any attempt to write to the device will fail. In this
case, pass on a "gracefull teardown" flag to the target to let it know
when this happens.

In the case of pblk, we pad the open line (close all open chunks) to
improve data retention. In the event of an ungraceful shutdown, avoid
this part and just clean up.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/core.c      | 10 +++++-----
 drivers/lightnvm/pblk-core.c | 13 ++++++++++++-
 drivers/lightnvm/pblk-gc.c   | 10 ++++++----
 drivers/lightnvm/pblk-init.c | 14 ++++++++------
 drivers/lightnvm/pblk.h      |  4 +++-
 include/linux/lightnvm.h     |  2 +-
 6 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index 63171cdce270..60aa7bc5a630 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -431,7 +431,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
 	return 0;
 err_sysfs:
 	if (tt->exit)
-		tt->exit(targetdata);
+		tt->exit(targetdata, true);
 err_init:
 	blk_cleanup_queue(tqueue);
 	tdisk->queue = NULL;
@@ -446,7 +446,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
 	return ret;
 }
 
-static void __nvm_remove_target(struct nvm_target *t)
+static void __nvm_remove_target(struct nvm_target *t, bool graceful)
 {
 	struct nvm_tgt_type *tt = t->type;
 	struct gendisk *tdisk = t->disk;
@@ -459,7 +459,7 @@ static void __nvm_remove_target(struct nvm_target *t)
 		tt->sysfs_exit(tdisk);
 
 	if (tt->exit)
-		tt->exit(tdisk->private_data);
+		tt->exit(tdisk->private_data, graceful);
 
 	nvm_remove_tgt_dev(t->dev, 1);
 	put_disk(tdisk);
@@ -489,7 +489,7 @@ static int nvm_remove_tgt(struct nvm_dev *dev, struct nvm_ioctl_remove *remove)
 		mutex_unlock(&dev->mlock);
 		return 1;
 	}
-	__nvm_remove_target(t);
+	__nvm_remove_target(t, true);
 	mutex_unlock(&dev->mlock);
 
 	return 0;
@@ -963,7 +963,7 @@ void nvm_unregister(struct nvm_dev *dev)
 	list_for_each_entry_safe(t, tmp, &dev->targets, list) {
 		if (t->dev->parent != dev)
 			continue;
-		__nvm_remove_target(t);
+		__nvm_remove_target(t, false);
 	}
 	mutex_unlock(&dev->mlock);
 
diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 4b10122aec89..5f1e5b1b3094 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1461,7 +1461,7 @@ static void pblk_line_close_meta_sync(struct pblk *pblk)
 	flush_workqueue(pblk->close_wq);
 }
 
-void pblk_pipeline_stop(struct pblk *pblk)
+void __pblk_pipeline_flush(struct pblk *pblk)
 {
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	int ret;
@@ -1486,6 +1486,11 @@ void pblk_pipeline_stop(struct pblk *pblk)
 
 	flush_workqueue(pblk->bb_wq);
 	pblk_line_close_meta_sync(pblk);
+}
+
+void __pblk_pipeline_stop(struct pblk *pblk)
+{
+	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 
 	spin_lock(&l_mg->free_lock);
 	pblk->state = PBLK_STATE_STOPPED;
@@ -1494,6 +1499,12 @@ void pblk_pipeline_stop(struct pblk *pblk)
 	spin_unlock(&l_mg->free_lock);
 }
 
+void pblk_pipeline_stop(struct pblk *pblk)
+{
+	__pblk_pipeline_flush(pblk);
+	__pblk_pipeline_stop(pblk);
+}
+
 struct pblk_line *pblk_line_replace_data(struct pblk *pblk)
 {
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index 6851a5c67189..b0cc277bf972 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -649,7 +649,7 @@ int pblk_gc_init(struct pblk *pblk)
 	return ret;
 }
 
-void pblk_gc_exit(struct pblk *pblk)
+void pblk_gc_exit(struct pblk *pblk, bool graceful)
 {
 	struct pblk_gc *gc = &pblk->gc;
 
@@ -663,10 +663,12 @@ void pblk_gc_exit(struct pblk *pblk)
 	if (gc->gc_reader_ts)
 		kthread_stop(gc->gc_reader_ts);
 
-	flush_workqueue(gc->gc_reader_wq);
+	if (graceful) {
+		flush_workqueue(gc->gc_reader_wq);
+		flush_workqueue(gc->gc_line_reader_wq);
+	}
+
 	destroy_workqueue(gc->gc_reader_wq);
-
-	flush_workqueue(gc->gc_line_reader_wq);
 	destroy_workqueue(gc->gc_line_reader_wq);
 
 	if (gc->gc_writer_ts)
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 9e3a43346d4c..bfc488d0dda9 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1118,23 +1118,25 @@ static void pblk_free(struct pblk *pblk)
 	kfree(pblk);
 }
 
-static void pblk_tear_down(struct pblk *pblk)
+static void pblk_tear_down(struct pblk *pblk, bool graceful)
 {
-	pblk_pipeline_stop(pblk);
+	if (graceful)
+		__pblk_pipeline_flush(pblk);
+	__pblk_pipeline_stop(pblk);
 	pblk_writer_stop(pblk);
 	pblk_rb_sync_l2p(&pblk->rwb);
 	pblk_rl_free(&pblk->rl);
 
-	pr_debug("pblk: consistent tear down\n");
+	pr_debug("pblk: consistent tear down (graceful:%d)\n", graceful);
 }
 
-static void pblk_exit(void *private)
+static void pblk_exit(void *private, bool graceful)
 {
 	struct pblk *pblk = private;
 
 	down_write(&pblk_lock);
-	pblk_gc_exit(pblk);
-	pblk_tear_down(pblk);
+	pblk_gc_exit(pblk, graceful);
+	pblk_tear_down(pblk, graceful);
 
 #ifdef CONFIG_NVM_DEBUG
 	pr_info("pblk exit: L2P CRC: %x\n", pblk_l2p_crc(pblk));
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index dfbfe9e9a385..0c69eb880f56 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -771,6 +771,8 @@ void pblk_line_close_meta(struct pblk *pblk, struct pblk_line *line);
 void pblk_line_close(struct pblk *pblk, struct pblk_line *line);
 void pblk_line_close_ws(struct work_struct *work);
 void pblk_pipeline_stop(struct pblk *pblk);
+void __pblk_pipeline_stop(struct pblk *pblk);
+void __pblk_pipeline_flush(struct pblk *pblk);
 void pblk_gen_run_ws(struct pblk *pblk, struct pblk_line *line, void *priv,
 		     void (*work)(struct work_struct *), gfp_t gfp_mask,
 		     struct workqueue_struct *wq);
@@ -864,7 +866,7 @@ int pblk_recov_setup_rq(struct pblk *pblk, struct pblk_c_ctx *c_ctx,
 #define PBLK_GC_RSV_LINE 1	/* Reserved lines for GC */
 
 int pblk_gc_init(struct pblk *pblk);
-void pblk_gc_exit(struct pblk *pblk);
+void pblk_gc_exit(struct pblk *pblk, bool graceful);
 void pblk_gc_should_start(struct pblk *pblk);
 void pblk_gc_should_stop(struct pblk *pblk);
 void pblk_gc_should_kick(struct pblk *pblk);
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index 6e0859b9d4d2..e9e0d1c7eaf5 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -489,7 +489,7 @@ typedef blk_qc_t (nvm_tgt_make_rq_fn)(struct request_queue *, struct bio *);
 typedef sector_t (nvm_tgt_capacity_fn)(void *);
 typedef void *(nvm_tgt_init_fn)(struct nvm_tgt_dev *, struct gendisk *,
 				int flags);
-typedef void (nvm_tgt_exit_fn)(void *);
+typedef void (nvm_tgt_exit_fn)(void *, bool);
 typedef int (nvm_tgt_sysfs_init_fn)(struct gendisk *);
 typedef void (nvm_tgt_sysfs_exit_fn)(struct gendisk *);
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 11/20] lightnvm: pblk: remove dead function
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (9 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 10/20] lightnvm: pass flag on graceful teardown to targets Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 12/20] lightnvm: pblk: rework write error recovery path Matias Bjørling
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

Remove dead function for manual sync. I/O

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 7 -------
 drivers/lightnvm/pblk.h      | 1 -
 2 files changed, 8 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 5f1e5b1b3094..5dae72e8b46b 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -342,13 +342,6 @@ void pblk_write_should_kick(struct pblk *pblk)
 		pblk_write_kick(pblk);
 }
 
-void pblk_end_io_sync(struct nvm_rq *rqd)
-{
-	struct completion *waiting = rqd->private;
-
-	complete(waiting);
-}
-
 static void pblk_wait_for_meta(struct pblk *pblk)
 {
 	do {
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 0c69eb880f56..5a4daf0b949d 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -796,7 +796,6 @@ void pblk_down_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas,
 void pblk_down_page(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas);
 void pblk_up_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas,
 		unsigned long *lun_bitmap);
-void pblk_end_io_sync(struct nvm_rq *rqd);
 int pblk_bio_add_pages(struct pblk *pblk, struct bio *bio, gfp_t flags,
 		       int nr_pages);
 void pblk_bio_free_pages(struct pblk *pblk, struct bio *bio, int off,
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 12/20] lightnvm: pblk: rework write error recovery path
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (10 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 11/20] lightnvm: pblk: remove dead function Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 13/20] lightnvm: pblk: garbage collect lines with failed writes Matias Bjørling
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Hans Holmberg, Matias Bjørling

From: Hans Holmberg <hans.holmberg@cnexlabs.com>

The write error recovery path is incomplete, so rework
the write error recovery handling to do resubmits directly
from the write buffer.

When a write error occurs, the remaining sectors in the chunk are
mapped out and invalidated and the request inserted in a resubmit list.

The writer thread checks if there are any requests to resubmit,
scans and invalidates any lbas that have been overwritten by later
writes and resubmits the failed entries.

Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-init.c     |   2 +
 drivers/lightnvm/pblk-rb.c       |  39 ------
 drivers/lightnvm/pblk-recovery.c |  91 -------------
 drivers/lightnvm/pblk-write.c    | 279 +++++++++++++++++++++++++--------------
 drivers/lightnvm/pblk.h          |  11 +-
 5 files changed, 187 insertions(+), 235 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index bfc488d0dda9..6f06727afcf6 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -426,6 +426,7 @@ static int pblk_core_init(struct pblk *pblk)
 		goto free_r_end_wq;
 
 	INIT_LIST_HEAD(&pblk->compl_list);
+	INIT_LIST_HEAD(&pblk->resubmit_list);
 
 	return 0;
 
@@ -1185,6 +1186,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
 	pblk->state = PBLK_STATE_RUNNING;
 	pblk->gc.gc_enabled = 0;
 
+	spin_lock_init(&pblk->resubmit_lock);
 	spin_lock_init(&pblk->trans_lock);
 	spin_lock_init(&pblk->lock);
 
diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index 58946ffebe81..1b74ec51a4ad 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -503,45 +503,6 @@ int pblk_rb_may_write_gc(struct pblk_rb *rb, unsigned int nr_entries,
 }
 
 /*
- * The caller of this function must ensure that the backpointer will not
- * overwrite the entries passed on the list.
- */
-unsigned int pblk_rb_read_to_bio_list(struct pblk_rb *rb, struct bio *bio,
-				      struct list_head *list,
-				      unsigned int max)
-{
-	struct pblk_rb_entry *entry, *tentry;
-	struct page *page;
-	unsigned int read = 0;
-	int ret;
-
-	list_for_each_entry_safe(entry, tentry, list, index) {
-		if (read > max) {
-			pr_err("pblk: too many entries on list\n");
-			goto out;
-		}
-
-		page = virt_to_page(entry->data);
-		if (!page) {
-			pr_err("pblk: could not allocate write bio page\n");
-			goto out;
-		}
-
-		ret = bio_add_page(bio, page, rb->seg_size, 0);
-		if (ret != rb->seg_size) {
-			pr_err("pblk: could not add page to write bio\n");
-			goto out;
-		}
-
-		list_del(&entry->index);
-		read++;
-	}
-
-out:
-	return read;
-}
-
-/*
  * Read available entries on rb and add them to the given bio. To avoid a memory
  * copy, a page reference to the write buffer is used to be added to the bio.
  *
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 3e079c2afa6e..788dce87043e 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -16,97 +16,6 @@
 
 #include "pblk.h"
 
-void pblk_submit_rec(struct work_struct *work)
-{
-	struct pblk_rec_ctx *recovery =
-			container_of(work, struct pblk_rec_ctx, ws_rec);
-	struct pblk *pblk = recovery->pblk;
-	struct nvm_rq *rqd = recovery->rqd;
-	struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
-	struct bio *bio;
-	unsigned int nr_rec_secs;
-	unsigned int pgs_read;
-	int ret;
-
-	nr_rec_secs = bitmap_weight((unsigned long int *)&rqd->ppa_status,
-								NVM_MAX_VLBA);
-
-	bio = bio_alloc(GFP_KERNEL, nr_rec_secs);
-
-	bio->bi_iter.bi_sector = 0;
-	bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
-	rqd->bio = bio;
-	rqd->nr_ppas = nr_rec_secs;
-
-	pgs_read = pblk_rb_read_to_bio_list(&pblk->rwb, bio, &recovery->failed,
-								nr_rec_secs);
-	if (pgs_read != nr_rec_secs) {
-		pr_err("pblk: could not read recovery entries\n");
-		goto err;
-	}
-
-	if (pblk_setup_w_rec_rq(pblk, rqd, c_ctx)) {
-		pr_err("pblk: could not setup recovery request\n");
-		goto err;
-	}
-
-#ifdef CONFIG_NVM_DEBUG
-	atomic_long_add(nr_rec_secs, &pblk->recov_writes);
-#endif
-
-	ret = pblk_submit_io(pblk, rqd);
-	if (ret) {
-		pr_err("pblk: I/O submission failed: %d\n", ret);
-		goto err;
-	}
-
-	mempool_free(recovery, pblk->rec_pool);
-	return;
-
-err:
-	bio_put(bio);
-	pblk_free_rqd(pblk, rqd, PBLK_WRITE);
-}
-
-int pblk_recov_setup_rq(struct pblk *pblk, struct pblk_c_ctx *c_ctx,
-			struct pblk_rec_ctx *recovery, u64 *comp_bits,
-			unsigned int comp)
-{
-	struct nvm_rq *rec_rqd;
-	struct pblk_c_ctx *rec_ctx;
-	int nr_entries = c_ctx->nr_valid + c_ctx->nr_padded;
-
-	rec_rqd = pblk_alloc_rqd(pblk, PBLK_WRITE);
-	rec_ctx = nvm_rq_to_pdu(rec_rqd);
-
-	/* Copy completion bitmap, but exclude the first X completed entries */
-	bitmap_shift_right((unsigned long int *)&rec_rqd->ppa_status,
-				(unsigned long int *)comp_bits,
-				comp, NVM_MAX_VLBA);
-
-	/* Save the context for the entries that need to be re-written and
-	 * update current context with the completed entries.
-	 */
-	rec_ctx->sentry = pblk_rb_wrap_pos(&pblk->rwb, c_ctx->sentry + comp);
-	if (comp >= c_ctx->nr_valid) {
-		rec_ctx->nr_valid = 0;
-		rec_ctx->nr_padded = nr_entries - comp;
-
-		c_ctx->nr_padded = comp - c_ctx->nr_valid;
-	} else {
-		rec_ctx->nr_valid = c_ctx->nr_valid - comp;
-		rec_ctx->nr_padded = c_ctx->nr_padded;
-
-		c_ctx->nr_valid = comp;
-		c_ctx->nr_padded = 0;
-	}
-
-	recovery->rqd = rec_rqd;
-	recovery->pblk = pblk;
-
-	return 0;
-}
-
 int pblk_recov_check_emeta(struct pblk *pblk, struct line_emeta *emeta_buf)
 {
 	u32 crc;
diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index 3e6f1ebd743a..f62e432f7c91 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -103,68 +103,149 @@ static void pblk_complete_write(struct pblk *pblk, struct nvm_rq *rqd,
 	pblk_rb_sync_end(&pblk->rwb, &flags);
 }
 
-/* When a write fails, we are not sure whether the block has grown bad or a page
- * range is more susceptible to write errors. If a high number of pages fail, we
- * assume that the block is bad and we mark it accordingly. In all cases, we
- * remap and resubmit the failed entries as fast as possible; if a flush is
- * waiting on a completion, the whole stack would stall otherwise.
- */
+/* Map remaining sectors in chunk, starting from ppa */
+static void pblk_map_remaining(struct pblk *pblk, struct ppa_addr *ppa)
+{
+	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
+	struct pblk_line *line;
+	struct ppa_addr map_ppa = *ppa;
+	u64 paddr;
+	int done = 0;
+
+	line = &pblk->lines[pblk_ppa_to_line(*ppa)];
+	spin_lock(&line->lock);
+
+	while (!done)  {
+		paddr = pblk_dev_ppa_to_line_addr(pblk, map_ppa);
+
+		if (!test_and_set_bit(paddr, line->map_bitmap))
+			line->left_msecs--;
+
+		if (!test_and_set_bit(paddr, line->invalid_bitmap))
+			le32_add_cpu(line->vsc, -1);
+
+		if (geo->version == NVM_OCSSD_SPEC_12) {
+			map_ppa.ppa++;
+			if (map_ppa.g.pg == geo->num_pg)
+				done = 1;
+		} else {
+			map_ppa.m.sec++;
+			if (map_ppa.m.sec == geo->clba)
+				done = 1;
+		}
+	}
+
+	spin_unlock(&line->lock);
+}
+
+static void pblk_prepare_resubmit(struct pblk *pblk, unsigned int sentry,
+				  unsigned int nr_entries)
+{
+	struct pblk_rb *rb = &pblk->rwb;
+	struct pblk_rb_entry *entry;
+	struct pblk_line *line;
+	struct pblk_w_ctx *w_ctx;
+	struct ppa_addr ppa_l2p;
+	int flags;
+	unsigned int pos, i;
+
+	spin_lock(&pblk->trans_lock);
+	pos = sentry;
+	for (i = 0; i < nr_entries; i++) {
+		entry = &rb->entries[pos];
+		w_ctx = &entry->w_ctx;
+
+		/* Check if the lba has been overwritten */
+		ppa_l2p = pblk_trans_map_get(pblk, w_ctx->lba);
+		if (!pblk_ppa_comp(ppa_l2p, entry->cacheline))
+			w_ctx->lba = ADDR_EMPTY;
+
+		/* Mark up the entry as submittable again */
+		flags = READ_ONCE(w_ctx->flags);
+		flags |= PBLK_WRITTEN_DATA;
+		/* Release flags on write context. Protect from writes */
+		smp_store_release(&w_ctx->flags, flags);
+
+		/* Decrese the reference count to the line as we will
+		 * re-map these entries
+		 */
+		line = &pblk->lines[pblk_ppa_to_line(w_ctx->ppa)];
+		kref_put(&line->ref, pblk_line_put);
+
+		pos = (pos + 1) & (rb->nr_entries - 1);
+	}
+	spin_unlock(&pblk->trans_lock);
+}
+
+static void pblk_queue_resubmit(struct pblk *pblk, struct pblk_c_ctx *c_ctx)
+{
+	struct pblk_c_ctx *r_ctx;
+
+	r_ctx = kzalloc(sizeof(struct pblk_c_ctx), GFP_KERNEL);
+	if (!r_ctx)
+		return;
+
+	r_ctx->lun_bitmap = NULL;
+	r_ctx->sentry = c_ctx->sentry;
+	r_ctx->nr_valid = c_ctx->nr_valid;
+	r_ctx->nr_padded = c_ctx->nr_padded;
+
+	spin_lock(&pblk->resubmit_lock);
+	list_add_tail(&r_ctx->list, &pblk->resubmit_list);
+	spin_unlock(&pblk->resubmit_lock);
+
+#ifdef CONFIG_NVM_DEBUG
+	atomic_long_add(c_ctx->nr_valid, &pblk->recov_writes);
+#endif
+}
+
+static void pblk_submit_rec(struct work_struct *work)
+{
+	struct pblk_rec_ctx *recovery =
+			container_of(work, struct pblk_rec_ctx, ws_rec);
+	struct pblk *pblk = recovery->pblk;
+	struct nvm_rq *rqd = recovery->rqd;
+	struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
+	struct ppa_addr *ppa_list;
+
+	pblk_log_write_err(pblk, rqd);
+
+	if (rqd->nr_ppas == 1)
+		ppa_list = &rqd->ppa_addr;
+	else
+		ppa_list = rqd->ppa_list;
+
+	pblk_map_remaining(pblk, ppa_list);
+	pblk_queue_resubmit(pblk, c_ctx);
+
+	pblk_up_rq(pblk, rqd->ppa_list, rqd->nr_ppas, c_ctx->lun_bitmap);
+	if (c_ctx->nr_padded)
+		pblk_bio_free_pages(pblk, rqd->bio, c_ctx->nr_valid,
+							c_ctx->nr_padded);
+	bio_put(rqd->bio);
+	pblk_free_rqd(pblk, rqd, PBLK_WRITE);
+	mempool_free(recovery, pblk->rec_pool);
+
+	atomic_dec(&pblk->inflight_io);
+}
+
+
 static void pblk_end_w_fail(struct pblk *pblk, struct nvm_rq *rqd)
 {
-	void *comp_bits = &rqd->ppa_status;
-	struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
 	struct pblk_rec_ctx *recovery;
-	struct ppa_addr *ppa_list = rqd->ppa_list;
-	int nr_ppas = rqd->nr_ppas;
-	unsigned int c_entries;
-	int bit, ret;
-
-	if (unlikely(nr_ppas == 1))
-		ppa_list = &rqd->ppa_addr;
 
 	recovery = mempool_alloc(pblk->rec_pool, GFP_ATOMIC);
-
-	INIT_LIST_HEAD(&recovery->failed);
-
-	bit = -1;
-	while ((bit = find_next_bit(comp_bits, nr_ppas, bit + 1)) < nr_ppas) {
-		struct pblk_rb_entry *entry;
-		struct ppa_addr ppa;
-
-		/* Logic error */
-		if (bit > c_ctx->nr_valid) {
-			WARN_ONCE(1, "pblk: corrupted write request\n");
-			mempool_free(recovery, pblk->rec_pool);
-			goto out;
-		}
-
-		ppa = ppa_list[bit];
-		entry = pblk_rb_sync_scan_entry(&pblk->rwb, &ppa);
-		if (!entry) {
-			pr_err("pblk: could not scan entry on write failure\n");
-			mempool_free(recovery, pblk->rec_pool);
-			goto out;
-		}
-
-		/* The list is filled first and emptied afterwards. No need for
-		 * protecting it with a lock
-		 */
-		list_add_tail(&entry->index, &recovery->failed);
+	if (!recovery) {
+		pr_err("pblk: could not allocate recovery work\n");
+		return;
 	}
 
-	c_entries = find_first_bit(comp_bits, nr_ppas);
-	ret = pblk_recov_setup_rq(pblk, c_ctx, recovery, comp_bits, c_entries);
-	if (ret) {
-		pr_err("pblk: could not recover from write failure\n");
-		mempool_free(recovery, pblk->rec_pool);
-		goto out;
-	}
+	recovery->pblk = pblk;
+	recovery->rqd = rqd;
 
 	INIT_WORK(&recovery->ws_rec, pblk_submit_rec);
 	queue_work(pblk->close_wq, &recovery->ws_rec);
-
-out:
-	pblk_complete_write(pblk, rqd, c_ctx);
 }
 
 static void pblk_end_io_write(struct nvm_rq *rqd)
@@ -173,8 +254,8 @@ static void pblk_end_io_write(struct nvm_rq *rqd)
 	struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
 
 	if (rqd->error) {
-		pblk_log_write_err(pblk, rqd);
-		return pblk_end_w_fail(pblk, rqd);
+		pblk_end_w_fail(pblk, rqd);
+		return;
 	}
 #ifdef CONFIG_NVM_DEBUG
 	else
@@ -266,31 +347,6 @@ static int pblk_setup_w_rq(struct pblk *pblk, struct nvm_rq *rqd,
 	return 0;
 }
 
-int pblk_setup_w_rec_rq(struct pblk *pblk, struct nvm_rq *rqd,
-			struct pblk_c_ctx *c_ctx)
-{
-	struct pblk_line_meta *lm = &pblk->lm;
-	unsigned long *lun_bitmap;
-	int ret;
-
-	lun_bitmap = kzalloc(lm->lun_bitmap_len, GFP_KERNEL);
-	if (!lun_bitmap)
-		return -ENOMEM;
-
-	c_ctx->lun_bitmap = lun_bitmap;
-
-	ret = pblk_alloc_w_rq(pblk, rqd, rqd->nr_ppas, pblk_end_io_write);
-	if (ret)
-		return ret;
-
-	pblk_map_rq(pblk, rqd, c_ctx->sentry, lun_bitmap, c_ctx->nr_valid, 0);
-
-	rqd->ppa_status = (u64)0;
-	rqd->flags = pblk_set_progr_mode(pblk, PBLK_WRITE);
-
-	return ret;
-}
-
 static int pblk_calc_secs_to_sync(struct pblk *pblk, unsigned int secs_avail,
 				  unsigned int secs_to_flush)
 {
@@ -339,6 +395,7 @@ int pblk_submit_meta_io(struct pblk *pblk, struct pblk_line *meta_line)
 	bio = pblk_bio_map_addr(pblk, data, rq_ppas, rq_len,
 					l_mg->emeta_alloc_type, GFP_KERNEL);
 	if (IS_ERR(bio)) {
+		pr_err("pblk: failed to map emeta io");
 		ret = PTR_ERR(bio);
 		goto fail_free_rqd;
 	}
@@ -515,27 +572,55 @@ static int pblk_submit_write(struct pblk *pblk)
 	unsigned int secs_avail, secs_to_sync, secs_to_com;
 	unsigned int secs_to_flush;
 	unsigned long pos;
+	unsigned int resubmit;
 
-	/* If there are no sectors in the cache, flushes (bios without data)
-	 * will be cleared on the cache threads
-	 */
-	secs_avail = pblk_rb_read_count(&pblk->rwb);
-	if (!secs_avail)
-		return 1;
-
-	secs_to_flush = pblk_rb_flush_point_count(&pblk->rwb);
-	if (!secs_to_flush && secs_avail < pblk->min_write_pgs)
-		return 1;
-
-	secs_to_sync = pblk_calc_secs_to_sync(pblk, secs_avail, secs_to_flush);
-	if (secs_to_sync > pblk->max_write_pgs) {
-		pr_err("pblk: bad buffer sync calculation\n");
-		return 1;
+	spin_lock(&pblk->resubmit_lock);
+	resubmit = !list_empty(&pblk->resubmit_list);
+	spin_unlock(&pblk->resubmit_lock);
+
+	/* Resubmit failed writes first */
+	if (resubmit) {
+		struct pblk_c_ctx *r_ctx;
+
+		spin_lock(&pblk->resubmit_lock);
+		r_ctx = list_first_entry(&pblk->resubmit_list,
+					struct pblk_c_ctx, list);
+		list_del(&r_ctx->list);
+		spin_unlock(&pblk->resubmit_lock);
+
+		secs_avail = r_ctx->nr_valid;
+		pos = r_ctx->sentry;
+
+		pblk_prepare_resubmit(pblk, pos, secs_avail);
+		secs_to_sync = pblk_calc_secs_to_sync(pblk, secs_avail,
+				secs_avail);
+
+		kfree(r_ctx);
+	} else {
+		/* If there are no sectors in the cache,
+		 * flushes (bios without data) will be cleared on
+		 * the cache threads
+		 */
+		secs_avail = pblk_rb_read_count(&pblk->rwb);
+		if (!secs_avail)
+			return 1;
+
+		secs_to_flush = pblk_rb_flush_point_count(&pblk->rwb);
+		if (!secs_to_flush && secs_avail < pblk->min_write_pgs)
+			return 1;
+
+		secs_to_sync = pblk_calc_secs_to_sync(pblk, secs_avail,
+					secs_to_flush);
+		if (secs_to_sync > pblk->max_write_pgs) {
+			pr_err("pblk: bad buffer sync calculation\n");
+			return 1;
+		}
+
+		secs_to_com = (secs_to_sync > secs_avail) ?
+			secs_avail : secs_to_sync;
+		pos = pblk_rb_read_commit(&pblk->rwb, secs_to_com);
 	}
 
-	secs_to_com = (secs_to_sync > secs_avail) ? secs_avail : secs_to_sync;
-	pos = pblk_rb_read_commit(&pblk->rwb, secs_to_com);
-
 	bio = bio_alloc(GFP_KERNEL, secs_to_sync);
 
 	bio->bi_iter.bi_sector = 0; /* internal bio */
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 5a4daf0b949d..a75ffae53a0d 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -128,7 +128,6 @@ struct pblk_pad_rq {
 struct pblk_rec_ctx {
 	struct pblk *pblk;
 	struct nvm_rq *rqd;
-	struct list_head failed;
 	struct work_struct ws_rec;
 };
 
@@ -664,6 +663,9 @@ struct pblk {
 
 	struct list_head compl_list;
 
+	spinlock_t resubmit_lock;	 /* Resubmit list lock */
+	struct list_head resubmit_list; /* Resubmit list for failed writes*/
+
 	mempool_t *page_bio_pool;
 	mempool_t *gen_ws_pool;
 	mempool_t *rec_pool;
@@ -713,9 +715,6 @@ void pblk_rb_sync_l2p(struct pblk_rb *rb);
 unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
 				 unsigned int pos, unsigned int nr_entries,
 				 unsigned int count);
-unsigned int pblk_rb_read_to_bio_list(struct pblk_rb *rb, struct bio *bio,
-				      struct list_head *list,
-				      unsigned int max);
 int pblk_rb_copy_to_bio(struct pblk_rb *rb, struct bio *bio, sector_t lba,
 			struct ppa_addr ppa, int bio_iter, bool advanced_bio);
 unsigned int pblk_rb_read_commit(struct pblk_rb *rb, unsigned int entries);
@@ -848,13 +847,9 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq);
 /*
  * pblk recovery
  */
-void pblk_submit_rec(struct work_struct *work);
 struct pblk_line *pblk_recov_l2p(struct pblk *pblk);
 int pblk_recov_pad(struct pblk *pblk);
 int pblk_recov_check_emeta(struct pblk *pblk, struct line_emeta *emeta);
-int pblk_recov_setup_rq(struct pblk *pblk, struct pblk_c_ctx *c_ctx,
-			struct pblk_rec_ctx *recovery, u64 *comp_bits,
-			unsigned int comp);
 
 /*
  * pblk gc
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 13/20] lightnvm: pblk: garbage collect lines with failed writes
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (11 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 12/20] lightnvm: pblk: rework write error recovery path Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 14/20] lightnvm: pblk: fix smeta write error path Matias Bjørling
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Hans Holmberg, Matias Bjørling

From: Hans Holmberg <hans.holmberg@cnexlabs.com>

Write failures should not happen under normal circumstances,
so in order to bring the chunk back into a known state as soon
as possible, evacuate all the valid data out of the line and let the
fw judge if the block can be written to in the next reset cycle.

Do this by introducing a new gc list for lines with failed writes,
and ensure that the rate limiter allocates a small portion of
the write bandwidth to get the job done.

The lba list is saved in memory for use during gc as we
cannot gurantee that the emeta data is readable if a write
error occurred.

Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c  |  45 ++++++++++++++++++-
 drivers/lightnvm/pblk-gc.c    | 102 +++++++++++++++++++++++++++---------------
 drivers/lightnvm/pblk-init.c  |  46 ++++++++++++-------
 drivers/lightnvm/pblk-rl.c    |  29 ++++++++++--
 drivers/lightnvm/pblk-sysfs.c |  15 ++++++-
 drivers/lightnvm/pblk-write.c |   2 +
 drivers/lightnvm/pblk.h       |  25 +++++++++--
 7 files changed, 200 insertions(+), 64 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 5dae72e8b46b..263da2e43567 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -373,7 +373,13 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line)
 
 	lockdep_assert_held(&line->lock);
 
-	if (!vsc) {
+	if (line->w_err_gc->has_write_err) {
+		if (line->gc_group != PBLK_LINEGC_WERR) {
+			line->gc_group = PBLK_LINEGC_WERR;
+			move_list = &l_mg->gc_werr_list;
+			pblk_rl_werr_line_in(&pblk->rl);
+		}
+	} else if (!vsc) {
 		if (line->gc_group != PBLK_LINEGC_FULL) {
 			line->gc_group = PBLK_LINEGC_FULL;
 			move_list = &l_mg->gc_full_list;
@@ -1589,8 +1595,13 @@ static void __pblk_line_put(struct pblk *pblk, struct pblk_line *line)
 	line->state = PBLK_LINESTATE_FREE;
 	line->gc_group = PBLK_LINEGC_NONE;
 	pblk_line_free(line);
+
+	if (line->w_err_gc->has_write_err) {
+		pblk_rl_werr_line_out(&pblk->rl);
+		line->w_err_gc->has_write_err = 0;
+	}
+
 	spin_unlock(&line->lock);
-
 	atomic_dec(&gc->pipeline_gc);
 
 	spin_lock(&l_mg->free_lock);
@@ -1753,11 +1764,34 @@ void pblk_line_close_meta(struct pblk *pblk, struct pblk_line *line)
 
 	spin_lock(&l_mg->close_lock);
 	spin_lock(&line->lock);
+
+	/* Update the in-memory start address for emeta, in case it has
+	 * shifted due to write errors
+	 */
+	if (line->emeta_ssec != line->cur_sec)
+		line->emeta_ssec = line->cur_sec;
+
 	list_add_tail(&line->list, &l_mg->emeta_list);
 	spin_unlock(&line->lock);
 	spin_unlock(&l_mg->close_lock);
 
 	pblk_line_should_sync_meta(pblk);
+
+
+}
+
+static void pblk_save_lba_list(struct pblk *pblk, struct pblk_line *line)
+{
+	struct pblk_line_meta *lm = &pblk->lm;
+	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
+	unsigned int lba_list_size = lm->emeta_len[2];
+	struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
+	struct pblk_emeta *emeta = line->emeta;
+
+	w_err_gc->lba_list = pblk_malloc(lba_list_size,
+					 l_mg->emeta_alloc_type, GFP_KERNEL);
+	memcpy(w_err_gc->lba_list, emeta_to_lbas(pblk, emeta->buf),
+				lba_list_size);
 }
 
 void pblk_line_close_ws(struct work_struct *work)
@@ -1766,6 +1800,13 @@ void pblk_line_close_ws(struct work_struct *work)
 									ws);
 	struct pblk *pblk = line_ws->pblk;
 	struct pblk_line *line = line_ws->line;
+	struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
+
+	/* Write errors makes the emeta start address stored in smeta invalid,
+	 * so keep a copy of the lba list until we've gc'd the line
+	 */
+	if (w_err_gc->has_write_err)
+		pblk_save_lba_list(pblk, line);
 
 	pblk_line_close(pblk, line);
 	mempool_free(line_ws, pblk->gen_ws_pool);
diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index b0cc277bf972..df88f1bdd921 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -129,6 +129,53 @@ static void pblk_gc_line_ws(struct work_struct *work)
 	kfree(gc_rq_ws);
 }
 
+static __le64 *get_lba_list_from_emeta(struct pblk *pblk,
+				       struct pblk_line *line)
+{
+	struct line_emeta *emeta_buf;
+	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
+	struct pblk_line_meta *lm = &pblk->lm;
+	unsigned int lba_list_size = lm->emeta_len[2];
+	__le64 *lba_list;
+	int ret;
+
+	emeta_buf = pblk_malloc(lm->emeta_len[0],
+				l_mg->emeta_alloc_type, GFP_KERNEL);
+	if (!emeta_buf)
+		return NULL;
+
+	ret = pblk_line_read_emeta(pblk, line, emeta_buf);
+	if (ret) {
+		pr_err("pblk: line %d read emeta failed (%d)\n",
+				line->id, ret);
+		pblk_mfree(emeta_buf, l_mg->emeta_alloc_type);
+		return NULL;
+	}
+
+	/* If this read fails, it means that emeta is corrupted.
+	 * For now, leave the line untouched.
+	 * TODO: Implement a recovery routine that scans and moves
+	 * all sectors on the line.
+	 */
+
+	ret = pblk_recov_check_emeta(pblk, emeta_buf);
+	if (ret) {
+		pr_err("pblk: inconsistent emeta (line %d)\n",
+				line->id);
+		pblk_mfree(emeta_buf, l_mg->emeta_alloc_type);
+		return NULL;
+	}
+
+	lba_list = pblk_malloc(lba_list_size,
+			       l_mg->emeta_alloc_type, GFP_KERNEL);
+	if (lba_list)
+		memcpy(lba_list, emeta_to_lbas(pblk, emeta_buf), lba_list_size);
+
+	pblk_mfree(emeta_buf, l_mg->emeta_alloc_type);
+
+	return lba_list;
+}
+
 static void pblk_gc_line_prepare_ws(struct work_struct *work)
 {
 	struct pblk_line_ws *line_ws = container_of(work, struct pblk_line_ws,
@@ -138,46 +185,26 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	struct pblk_line_meta *lm = &pblk->lm;
 	struct pblk_gc *gc = &pblk->gc;
-	struct line_emeta *emeta_buf;
 	struct pblk_line_ws *gc_rq_ws;
 	struct pblk_gc_rq *gc_rq;
 	__le64 *lba_list;
 	unsigned long *invalid_bitmap;
 	int sec_left, nr_secs, bit;
-	int ret;
 
 	invalid_bitmap = kmalloc(lm->sec_bitmap_len, GFP_KERNEL);
 	if (!invalid_bitmap)
 		goto fail_free_ws;
 
-	emeta_buf = pblk_malloc(lm->emeta_len[0], l_mg->emeta_alloc_type,
-								GFP_KERNEL);
-	if (!emeta_buf) {
-		pr_err("pblk: cannot use GC emeta\n");
-		goto fail_free_bitmap;
-	}
-
-	ret = pblk_line_read_emeta(pblk, line, emeta_buf);
-	if (ret) {
-		pr_err("pblk: line %d read emeta failed (%d)\n", line->id, ret);
-		goto fail_free_emeta;
-	}
-
-	/* If this read fails, it means that emeta is corrupted. For now, leave
-	 * the line untouched. TODO: Implement a recovery routine that scans and
-	 * moves all sectors on the line.
-	 */
-
-	ret = pblk_recov_check_emeta(pblk, emeta_buf);
-	if (ret) {
-		pr_err("pblk: inconsistent emeta (line %d)\n", line->id);
-		goto fail_free_emeta;
-	}
-
-	lba_list = emeta_to_lbas(pblk, emeta_buf);
-	if (!lba_list) {
-		pr_err("pblk: could not interpret emeta (line %d)\n", line->id);
-		goto fail_free_emeta;
+	if (line->w_err_gc->has_write_err) {
+		lba_list = line->w_err_gc->lba_list;
+		line->w_err_gc->lba_list = NULL;
+	} else {
+		lba_list = get_lba_list_from_emeta(pblk, line);
+		if (!lba_list) {
+			pr_err("pblk: could not interpret emeta (line %d)\n",
+					line->id);
+			goto fail_free_ws;
+		}
 	}
 
 	spin_lock(&line->lock);
@@ -187,14 +214,14 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 
 	if (sec_left < 0) {
 		pr_err("pblk: corrupted GC line (%d)\n", line->id);
-		goto fail_free_emeta;
+		goto fail_free_lba_list;
 	}
 
 	bit = -1;
 next_rq:
 	gc_rq = kmalloc(sizeof(struct pblk_gc_rq), GFP_KERNEL);
 	if (!gc_rq)
-		goto fail_free_emeta;
+		goto fail_free_lba_list;
 
 	nr_secs = 0;
 	do {
@@ -240,7 +267,7 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 		goto next_rq;
 
 out:
-	pblk_mfree(emeta_buf, l_mg->emeta_alloc_type);
+	pblk_mfree(lba_list, l_mg->emeta_alloc_type);
 	kfree(line_ws);
 	kfree(invalid_bitmap);
 
@@ -251,9 +278,8 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 
 fail_free_gc_rq:
 	kfree(gc_rq);
-fail_free_emeta:
-	pblk_mfree(emeta_buf, l_mg->emeta_alloc_type);
-fail_free_bitmap:
+fail_free_lba_list:
+	pblk_mfree(lba_list, l_mg->emeta_alloc_type);
 	kfree(invalid_bitmap);
 fail_free_ws:
 	kfree(line_ws);
@@ -349,12 +375,14 @@ static struct pblk_line *pblk_gc_get_victim_line(struct pblk *pblk,
 static bool pblk_gc_should_run(struct pblk_gc *gc, struct pblk_rl *rl)
 {
 	unsigned int nr_blocks_free, nr_blocks_need;
+	unsigned int werr_lines = atomic_read(&rl->werr_lines);
 
 	nr_blocks_need = pblk_rl_high_thrs(rl);
 	nr_blocks_free = pblk_rl_nr_free_blks(rl);
 
 	/* This is not critical, no need to take lock here */
-	return ((gc->gc_active) && (nr_blocks_need > nr_blocks_free));
+	return ((werr_lines > 0) ||
+		((gc->gc_active) && (nr_blocks_need > nr_blocks_free)));
 }
 
 void pblk_gc_free_full_lines(struct pblk *pblk)
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 6f06727afcf6..d65d2f972ccf 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -493,11 +493,17 @@ static void pblk_line_mg_free(struct pblk *pblk)
 	}
 }
 
-static void pblk_line_meta_free(struct pblk_line *line)
+static void pblk_line_meta_free(struct pblk_line_mgmt *l_mg,
+				struct pblk_line *line)
 {
+	struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
+
 	kfree(line->blk_bitmap);
 	kfree(line->erase_bitmap);
 	kfree(line->chks);
+
+	pblk_mfree(w_err_gc->lba_list, l_mg->emeta_alloc_type);
+	kfree(w_err_gc);
 }
 
 static void pblk_lines_free(struct pblk *pblk)
@@ -511,7 +517,7 @@ static void pblk_lines_free(struct pblk *pblk)
 		line = &pblk->lines[i];
 
 		pblk_line_free(line);
-		pblk_line_meta_free(line);
+		pblk_line_meta_free(l_mg, line);
 	}
 	spin_unlock(&l_mg->free_lock);
 
@@ -813,20 +819,28 @@ static int pblk_alloc_line_meta(struct pblk *pblk, struct pblk_line *line)
 		return -ENOMEM;
 
 	line->erase_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
-	if (!line->erase_bitmap) {
-		kfree(line->blk_bitmap);
-		return -ENOMEM;
-	}
+	if (!line->erase_bitmap)
+		goto free_blk_bitmap;
+
 
 	line->chks = kmalloc(lm->blk_per_line * sizeof(struct nvm_chk_meta),
 								GFP_KERNEL);
-	if (!line->chks) {
-		kfree(line->erase_bitmap);
-		kfree(line->blk_bitmap);
-		return -ENOMEM;
-	}
+	if (!line->chks)
+		goto free_erase_bitmap;
+
+	line->w_err_gc = kzalloc(sizeof(struct pblk_w_err_gc), GFP_KERNEL);
+	if (!line->w_err_gc)
+		goto free_chks;
 
 	return 0;
+
+free_chks:
+	kfree(line->chks);
+free_erase_bitmap:
+	kfree(line->erase_bitmap);
+free_blk_bitmap:
+	kfree(line->blk_bitmap);
+	return -ENOMEM;
 }
 
 static int pblk_line_mg_init(struct pblk *pblk)
@@ -851,12 +865,14 @@ static int pblk_line_mg_init(struct pblk *pblk)
 	INIT_LIST_HEAD(&l_mg->gc_mid_list);
 	INIT_LIST_HEAD(&l_mg->gc_low_list);
 	INIT_LIST_HEAD(&l_mg->gc_empty_list);
+	INIT_LIST_HEAD(&l_mg->gc_werr_list);
 
 	INIT_LIST_HEAD(&l_mg->emeta_list);
 
-	l_mg->gc_lists[0] = &l_mg->gc_high_list;
-	l_mg->gc_lists[1] = &l_mg->gc_mid_list;
-	l_mg->gc_lists[2] = &l_mg->gc_low_list;
+	l_mg->gc_lists[0] = &l_mg->gc_werr_list;
+	l_mg->gc_lists[1] = &l_mg->gc_high_list;
+	l_mg->gc_lists[2] = &l_mg->gc_mid_list;
+	l_mg->gc_lists[3] = &l_mg->gc_low_list;
 
 	spin_lock_init(&l_mg->free_lock);
 	spin_lock_init(&l_mg->close_lock);
@@ -1063,7 +1079,7 @@ static int pblk_lines_init(struct pblk *pblk)
 
 fail_free_lines:
 	while (--i >= 0)
-		pblk_line_meta_free(&pblk->lines[i]);
+		pblk_line_meta_free(l_mg, &pblk->lines[i]);
 	kfree(pblk->lines);
 fail_free_chunk_meta:
 	kfree(chunk_meta);
diff --git a/drivers/lightnvm/pblk-rl.c b/drivers/lightnvm/pblk-rl.c
index 883a7113b19d..6a0616a6fcaf 100644
--- a/drivers/lightnvm/pblk-rl.c
+++ b/drivers/lightnvm/pblk-rl.c
@@ -73,6 +73,16 @@ void pblk_rl_user_in(struct pblk_rl *rl, int nr_entries)
 	pblk_rl_kick_u_timer(rl);
 }
 
+void pblk_rl_werr_line_in(struct pblk_rl *rl)
+{
+	atomic_inc(&rl->werr_lines);
+}
+
+void pblk_rl_werr_line_out(struct pblk_rl *rl)
+{
+	atomic_dec(&rl->werr_lines);
+}
+
 void pblk_rl_gc_in(struct pblk_rl *rl, int nr_entries)
 {
 	atomic_add(nr_entries, &rl->rb_gc_cnt);
@@ -99,11 +109,21 @@ static void __pblk_rl_update_rates(struct pblk_rl *rl,
 {
 	struct pblk *pblk = container_of(rl, struct pblk, rl);
 	int max = rl->rb_budget;
+	int werr_gc_needed = atomic_read(&rl->werr_lines);
 
 	if (free_blocks >= rl->high) {
-		rl->rb_user_max = max;
-		rl->rb_gc_max = 0;
-		rl->rb_state = PBLK_RL_HIGH;
+		if (werr_gc_needed) {
+			/* Allocate a small budget for recovering
+			 * lines with write errors
+			 */
+			rl->rb_gc_max = 1 << rl->rb_windows_pw;
+			rl->rb_user_max = max - rl->rb_gc_max;
+			rl->rb_state = PBLK_RL_WERR;
+		} else {
+			rl->rb_user_max = max;
+			rl->rb_gc_max = 0;
+			rl->rb_state = PBLK_RL_OFF;
+		}
 	} else if (free_blocks < rl->high) {
 		int shift = rl->high_pw - rl->rb_windows_pw;
 		int user_windows = free_blocks >> shift;
@@ -124,7 +144,7 @@ static void __pblk_rl_update_rates(struct pblk_rl *rl,
 		rl->rb_state = PBLK_RL_LOW;
 	}
 
-	if (rl->rb_state == (PBLK_RL_MID | PBLK_RL_LOW))
+	if (rl->rb_state != PBLK_RL_OFF)
 		pblk_gc_should_start(pblk);
 	else
 		pblk_gc_should_stop(pblk);
@@ -221,6 +241,7 @@ void pblk_rl_init(struct pblk_rl *rl, int budget)
 	atomic_set(&rl->rb_user_cnt, 0);
 	atomic_set(&rl->rb_gc_cnt, 0);
 	atomic_set(&rl->rb_space, -1);
+	atomic_set(&rl->werr_lines, 0);
 
 	timer_setup(&rl->u_timer, pblk_rl_u_timer, 0);
 
diff --git a/drivers/lightnvm/pblk-sysfs.c b/drivers/lightnvm/pblk-sysfs.c
index e61909af23a5..88a0a7c407aa 100644
--- a/drivers/lightnvm/pblk-sysfs.c
+++ b/drivers/lightnvm/pblk-sysfs.c
@@ -173,6 +173,8 @@ static ssize_t pblk_sysfs_lines(struct pblk *pblk, char *page)
 	int free_line_cnt = 0, closed_line_cnt = 0, emeta_line_cnt = 0;
 	int d_line_cnt = 0, l_line_cnt = 0;
 	int gc_full = 0, gc_high = 0, gc_mid = 0, gc_low = 0, gc_empty = 0;
+	int gc_werr = 0;
+
 	int bad = 0, cor = 0;
 	int msecs = 0, cur_sec = 0, vsc = 0, sec_in_line = 0;
 	int map_weight = 0, meta_weight = 0;
@@ -237,6 +239,15 @@ static ssize_t pblk_sysfs_lines(struct pblk *pblk, char *page)
 		gc_empty++;
 	}
 
+	list_for_each_entry(line, &l_mg->gc_werr_list, list) {
+		if (line->type == PBLK_LINETYPE_DATA)
+			d_line_cnt++;
+		else if (line->type == PBLK_LINETYPE_LOG)
+			l_line_cnt++;
+		closed_line_cnt++;
+		gc_werr++;
+	}
+
 	list_for_each_entry(line, &l_mg->bad_list, list)
 		bad++;
 	list_for_each_entry(line, &l_mg->corrupt_list, list)
@@ -275,8 +286,8 @@ static ssize_t pblk_sysfs_lines(struct pblk *pblk, char *page)
 					l_mg->nr_lines);
 
 	sz += snprintf(page + sz, PAGE_SIZE - sz,
-		"GC: full:%d, high:%d, mid:%d, low:%d, empty:%d, queue:%d\n",
-			gc_full, gc_high, gc_mid, gc_low, gc_empty,
+		"GC: full:%d, high:%d, mid:%d, low:%d, empty:%d, werr: %d, queue:%d\n",
+			gc_full, gc_high, gc_mid, gc_low, gc_empty, gc_werr,
 			atomic_read(&pblk->gc.read_inflight_gc));
 
 	sz += snprintf(page + sz, PAGE_SIZE - sz,
diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index f62e432f7c91..f33c2c3993f0 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -136,6 +136,7 @@ static void pblk_map_remaining(struct pblk *pblk, struct ppa_addr *ppa)
 		}
 	}
 
+	line->w_err_gc->has_write_err = 1;
 	spin_unlock(&line->lock);
 }
 
@@ -279,6 +280,7 @@ static void pblk_end_io_write_meta(struct nvm_rq *rqd)
 	if (rqd->error) {
 		pblk_log_write_err(pblk, rqd);
 		pr_err("pblk: metadata I/O failed. Line %d\n", line->id);
+		line->w_err_gc->has_write_err = 1;
 	}
 
 	sync = atomic_add_return(rqd->nr_ppas, &emeta->sync);
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index a75ffae53a0d..7fbdfdc809d3 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -89,12 +89,14 @@ struct pblk_sec_meta {
 /* The number of GC lists and the rate-limiter states go together. This way the
  * rate-limiter can dictate how much GC is needed based on resource utilization.
  */
-#define PBLK_GC_NR_LISTS 3
+#define PBLK_GC_NR_LISTS 4
 
 enum {
-	PBLK_RL_HIGH = 1,
-	PBLK_RL_MID = 2,
-	PBLK_RL_LOW = 3,
+	PBLK_RL_OFF = 0,
+	PBLK_RL_WERR = 1,
+	PBLK_RL_HIGH = 2,
+	PBLK_RL_MID = 3,
+	PBLK_RL_LOW = 4
 };
 
 #define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * PBLK_MAX_REQ_ADDRS)
@@ -278,6 +280,8 @@ struct pblk_rl {
 	int rb_user_active;
 	int rb_gc_active;
 
+	atomic_t werr_lines;	/* Number of write error lines that needs gc */
+
 	struct timer_list u_timer;
 
 	unsigned long long nr_secs;
@@ -311,6 +315,7 @@ enum {
 	PBLK_LINEGC_MID = 23,
 	PBLK_LINEGC_HIGH = 24,
 	PBLK_LINEGC_FULL = 25,
+	PBLK_LINEGC_WERR = 26
 };
 
 #define PBLK_MAGIC 0x70626c6b /*pblk*/
@@ -412,6 +417,11 @@ struct pblk_smeta {
 	struct line_smeta *buf;		/* smeta buffer in persistent format */
 };
 
+struct pblk_w_err_gc {
+	int has_write_err;
+	__le64 *lba_list;
+};
+
 struct pblk_line {
 	struct pblk *pblk;
 	unsigned int id;		/* Line number corresponds to the
@@ -457,6 +467,8 @@ struct pblk_line {
 
 	struct kref ref;		/* Write buffer L2P references */
 
+	struct pblk_w_err_gc *w_err_gc;	/* Write error gc recovery metadata */
+
 	spinlock_t lock;		/* Necessary for invalid_bitmap only */
 };
 
@@ -488,6 +500,8 @@ struct pblk_line_mgmt {
 	struct list_head gc_mid_list;	/* Full lines ready to GC, mid isc */
 	struct list_head gc_low_list;	/* Full lines ready to GC, low isc */
 
+	struct list_head gc_werr_list;  /* Write err recovery list */
+
 	struct list_head gc_full_list;	/* Full lines ready to GC, no valid */
 	struct list_head gc_empty_list;	/* Full lines close, all valid */
 
@@ -890,6 +904,9 @@ void pblk_rl_free_lines_dec(struct pblk_rl *rl, struct pblk_line *line,
 			    bool used);
 int pblk_rl_is_limit(struct pblk_rl *rl);
 
+void pblk_rl_werr_line_in(struct pblk_rl *rl);
+void pblk_rl_werr_line_out(struct pblk_rl *rl);
+
 /*
  * pblk sysfs
  */
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 14/20] lightnvm: pblk: fix smeta write error path
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (12 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 13/20] lightnvm: pblk: garbage collect lines with failed writes Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 15/20] lightnvm: proper error handling for pblk_bio_add_pages Matias Bjørling
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Hans Holmberg, Matias Bjørling

From: Hans Holmberg <hans.holmberg@cnexlabs.com>

Smeta write errors were previously ignored. Skip these
lines instead and throw them back on the free
list, so the chunks will go through a reset cycle
before we attempt to use the line again.

Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 263da2e43567..e43093e27084 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -849,9 +849,10 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
 	atomic_dec(&pblk->inflight_io);
 
 	if (rqd.error) {
-		if (dir == PBLK_WRITE)
+		if (dir == PBLK_WRITE) {
 			pblk_log_write_err(pblk, &rqd);
-		else if (dir == PBLK_READ)
+			ret = 1;
+		} else if (dir == PBLK_READ)
 			pblk_log_read_err(pblk, &rqd);
 	}
 
@@ -1101,7 +1102,7 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
 
 	if (init && pblk_line_submit_smeta_io(pblk, line, off, PBLK_WRITE)) {
 		pr_debug("pblk: line smeta I/O failed. Retry\n");
-		return 1;
+		return 0;
 	}
 
 	bitmap_copy(line->invalid_bitmap, line->map_bitmap, lm->sec_per_line);
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 15/20] lightnvm: proper error handling for pblk_bio_add_pages
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (13 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 14/20] lightnvm: pblk: fix smeta write error path Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 16/20] lightnvm: error handling when whole line is bad Matias Bjørling
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Igor Konopko, Marcin Dziegielewski,
	Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

Currently in case of error caused by bio_pc_add_page in
pblk_bio_add_pages two issues occur when calling from
pblk_rb_read_to_bio(). First one is in pblk_bio_free_pages, since we
are trying to free pages not allocated from our mempool. Second one
is the warn from dma_pool_free, that we are trying to free NULL
pointer dma.

This commit fix both issues.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index e43093e27084..a20b41c355c5 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -278,7 +278,9 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type)
 		return;
 	}
 
-	nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
+	if (rqd->meta_list)
+		nvm_dev_dma_free(dev->parent, rqd->meta_list,
+				rqd->dma_meta_list);
 	mempool_free(rqd, pool);
 }
 
@@ -316,7 +318,7 @@ int pblk_bio_add_pages(struct pblk *pblk, struct bio *bio, gfp_t flags,
 
 	return 0;
 err:
-	pblk_bio_free_pages(pblk, bio, 0, i - 1);
+	pblk_bio_free_pages(pblk, bio, (bio->bi_vcnt - i), i);
 	return -1;
 }
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 16/20] lightnvm: error handling when whole line is bad
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (14 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 15/20] lightnvm: proper error handling for pblk_bio_add_pages Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28 10:59   ` Javier Gonzalez
  2018-05-28  8:58 ` [GIT PULL 17/20] lightnvm: fix partial read error path Matias Bjørling
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Igor Konopko, Marcin Dziegielewski,
	Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

When all the blocks (chunks) in line are marked as bad (offline)
we shouldn't try to read smeta during init process.

Currently we are trying to do so by passing -1 as PPA address,
what causes multiple warnings, that we issuing IOs to out-of-bound
PPAs.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Updated title.
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index a20b41c355c5..e3e883547198 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -868,6 +868,11 @@ int pblk_line_read_smeta(struct pblk *pblk, struct pblk_line *line)
 {
 	u64 bpaddr = pblk_line_smeta_start(pblk, line);
 
+	if (bpaddr == -1) {
+		/* Whole line is bad - do not try to read smeta. */
+		return 1;
+	}
+
 	return pblk_line_submit_smeta_io(pblk, line, bpaddr, PBLK_READ_RECOV);
 }
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 17/20] lightnvm: fix partial read error path
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (15 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 16/20] lightnvm: error handling when whole line is bad Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0 Matias Bjørling
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Igor Konopko, Marcin Dziegielewski,
	Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

When error occurs during bio_add_page on partial read path, pblk
tries to free pages twice.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-read.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index a2e678de428f..fa7b60f852d9 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -256,7 +256,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 	new_bio = bio_alloc(GFP_KERNEL, nr_holes);
 
 	if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
-		goto err;
+		goto err_add_pages;
 
 	if (nr_holes != new_bio->bi_vcnt) {
 		pr_err("pblk: malformed bio\n");
@@ -347,10 +347,10 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 	return NVM_IO_OK;
 
 err:
-	pr_err("pblk: failed to perform partial read\n");
-
 	/* Free allocated pages in new bio */
-	pblk_bio_free_pages(pblk, bio, 0, new_bio->bi_vcnt);
+	pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
+err_add_pages:
+	pr_err("pblk: failed to perform partial read\n");
 	__pblk_end_io_read(pblk, rqd, false);
 	return NVM_IO_ERR;
 }
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (16 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 17/20] lightnvm: fix partial read error path Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28 11:02   ` Javier Gonzalez
  2018-05-28  8:58 ` [GIT PULL 19/20] lightnvm: pblk: add possibility to set write buffer size manually Matias Bjørling
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Marcin Dziegielewski, Igor Konopko,
	Matias Bjørling

From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>

Some devices can expose mw_cunits equal to 0, it can cause creation
of too small write buffer and cause performance to drop on write
workloads.

To handle that, we use the default value for MLC and beacause it
covers both 1.2 and 2.0 OC specification, setting up mw_cunits
in nvme_nvm_setup_12 function isn't longer necessary.

Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-init.c | 10 +++++++++-
 drivers/nvme/host/lightnvm.c |  1 -
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index d65d2f972ccf..0f277744266b 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
 	atomic64_set(&pblk->nr_flush, 0);
 	pblk->nr_flush_rst = 0;
 
-	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
+	if (geo->mw_cunits) {
+		pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
+	} else {
+		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
+		/*
+		 * Some devices can expose mw_cunits equal to 0, so let's use
+		 * here default safe value for MLC.
+		 */
+	}
 
 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
 	max_write_ppas = pblk->min_write_pgs * geo->all_luns;
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 41279da799ed..c747792da915 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct nvme_nvm_id12 *id,
 
 	geo->ws_min = sec_per_pg;
 	geo->ws_opt = sec_per_pg;
-	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC safe values */
 
 	/* Do not impose values for maximum number of open blocks as it is
 	 * unspecified in 1.2. Users of 1.2 must be aware of this and eventually
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 19/20] lightnvm: pblk: add possibility to set write buffer size manually
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (17 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0 Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC Matias Bjørling
  2018-06-01 10:45 ` [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
  20 siblings, 0 replies; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Marcin Dziegielewski, Igor Konopko,
	Matias Bjørling

From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>

In some cases, users can want set write buffer size manually, e.g. to
adjust it to specific workload. This patch provides the possibility
to set write buffer size via module parameter feature.

Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-init.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 0f277744266b..25aa1e73984f 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -20,6 +20,11 @@
 
 #include "pblk.h"
 
+unsigned int write_buffer_size;
+
+module_param(write_buffer_size, uint, 0644);
+MODULE_PARM_DESC(write_buffer_size, "number of entries in a write buffer");
+
 static struct kmem_cache *pblk_ws_cache, *pblk_rec_cache, *pblk_g_rq_cache,
 				*pblk_w_rq_cache;
 static DECLARE_RWSEM(pblk_lock);
@@ -172,10 +177,15 @@ static int pblk_rwb_init(struct pblk *pblk)
 	struct nvm_tgt_dev *dev = pblk->dev;
 	struct nvm_geo *geo = &dev->geo;
 	struct pblk_rb_entry *entries;
-	unsigned long nr_entries;
+	unsigned long nr_entries, buffer_size;
 	unsigned int power_size, power_seg_sz;
 
-	nr_entries = pblk_rb_calculate_size(pblk->pgs_in_buffer);
+	if (write_buffer_size && (write_buffer_size > pblk->pgs_in_buffer))
+		buffer_size = write_buffer_size;
+	else
+		buffer_size = pblk->pgs_in_buffer;
+
+	nr_entries = pblk_rb_calculate_size(buffer_size);
 
 	entries = vzalloc(nr_entries * sizeof(struct pblk_rb_entry));
 	if (!entries)
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (18 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 19/20] lightnvm: pblk: add possibility to set write buffer size manually Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28 10:51   ` Javier Gonzalez
  2018-06-01 10:45 ` [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
  20 siblings, 1 reply; 37+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Igor Konopko, Marcin Dziegielewski,
	Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

During sequential workloads we can met the case when almost all the
lines are fully written with data. In that case rate limiter will
significantly reduce the max number of requests for user IOs.

Unfortunately in the case when round buffer is flushed to drive and
the entries are not yet removed (which is ok, since there is still
enough free entries in round buffer for user IO) we hang on user
IO due to not enough entries in rate limiter. The reason is that
rate limiter user entries are decreased after freeing the round
buffer entries, which does not happen if there is still plenty of
space in round buffer.

This patch forces freeing the round buffer by calling
pblk_rb_sync_l2p and thus making new free entries in rate limiter,
when there is no enough of them for user IO.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Reworded description.
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-init.c | 2 ++
 drivers/lightnvm/pblk-rb.c   | 7 +++----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 25aa1e73984f..9d7d9e3b8506 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1159,7 +1159,9 @@ static void pblk_tear_down(struct pblk *pblk, bool graceful)
 		__pblk_pipeline_flush(pblk);
 	__pblk_pipeline_stop(pblk);
 	pblk_writer_stop(pblk);
+	spin_lock(&pblk->rwb.w_lock);
 	pblk_rb_sync_l2p(&pblk->rwb);
+	spin_unlock(&pblk->rwb.w_lock);
 	pblk_rl_free(&pblk->rl);
 
 	pr_debug("pblk: consistent tear down (graceful:%d)\n", graceful);
diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index 1b74ec51a4ad..91824cd3e8d8 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -266,21 +266,18 @@ static int pblk_rb_update_l2p(struct pblk_rb *rb, unsigned int nr_entries,
  * Update the l2p entry for all sectors stored on the write buffer. This means
  * that all future lookups to the l2p table will point to a device address, not
  * to the cacheline in the write buffer.
+ * Caller must ensure that rb->w_lock is taken.
  */
 void pblk_rb_sync_l2p(struct pblk_rb *rb)
 {
 	unsigned int sync;
 	unsigned int to_update;
 
-	spin_lock(&rb->w_lock);
-
 	/* Protect from reads and writes */
 	sync = smp_load_acquire(&rb->sync);
 
 	to_update = pblk_rb_ring_count(sync, rb->l2p_update, rb->nr_entries);
 	__pblk_rb_update_l2p(rb, to_update);
-
-	spin_unlock(&rb->w_lock);
 }
 
 /*
@@ -462,6 +459,8 @@ int pblk_rb_may_write_user(struct pblk_rb *rb, struct bio *bio,
 	spin_lock(&rb->w_lock);
 	io_ret = pblk_rl_user_may_insert(&pblk->rl, nr_entries);
 	if (io_ret) {
+		/* Sync RB & L2P in order to update rate limiter values */
+		pblk_rb_sync_l2p(rb);
 		spin_unlock(&rb->w_lock);
 		return io_ret;
 	}
-- 
2.11.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC
  2018-05-28  8:58 ` [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC Matias Bjørling
@ 2018-05-28 10:51   ` Javier Gonzalez
  2018-05-29 13:07     ` Konopko, Igor J
  0 siblings, 1 reply; 37+ messages in thread
From: Javier Gonzalez @ 2018-05-28 10:51 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Jens Axboe, linux-block, linux-kernel, Konopko, Igor J,
	Marcin Dziegielewski

[-- Attachment #1: Type: text/plain, Size: 2514 bytes --]

Javier

I somehow missed these patches in the mailing list. Sorry for coming
with feedback this late. I'll look at my filters - in any case, would
you mind Cc'ing me in the future?

> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> From: Igor Konopko <igor.j.konopko@intel.com>
> 
> During sequential workloads we can met the case when almost all the
> lines are fully written with data. In that case rate limiter will
> significantly reduce the max number of requests for user IOs.

Do you mean random writes? On fully sequential, a line will either be
fully written, fully invalidated or on its way to be written. When
invalidating the line, then the whole line will be invalidated and GC
will free it without having to move valid data.

> 
> Unfortunately in the case when round buffer is flushed to drive and
> the entries are not yet removed (which is ok, since there is still
> enough free entries in round buffer for user IO) we hang on user
> IO due to not enough entries in rate limiter. The reason is that
> rate limiter user entries are decreased after freeing the round
> buffer entries, which does not happen if there is still plenty of
> space in round buffer.
> 
> This patch forces freeing the round buffer by calling
> pblk_rb_sync_l2p and thus making new free entries in rate limiter,
> when there is no enough of them for user IO.

I can see why you might have problems with very low OP due to the rate
limiter, but unfortunately this is not a good way of solving the
problem. When you do this, you basically make the L2P to point to the
device instead of pointing to the write cache, which in essence bypasses
mw_cuints. As a result, if a read comes in to one of the synced entries,
it will violate the device-host contract and most probably fail (for
sure fail on > SLC).

I think that the right way of solving this problem is separating the
write and GC buffers and then assigning tokens to them. The write thread
will then consume both buffers based on these tokens. In this case, user
I/O will have a buffer for itself, which will be guaranteed to advance
at the rate the rate-limiter is allowing it to. Note that the 2 buffers
can be a single buffer with a new set of pointers so that the lookup can
be done with a single bit.

I have been looking for time to implement this for a while. If you want
to give it a go, we can talk and I can give you some pointers on
potential issues I have thought about.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [GIT PULL 16/20] lightnvm: error handling when whole line is bad
  2018-05-28  8:58 ` [GIT PULL 16/20] lightnvm: error handling when whole line is bad Matias Bjørling
@ 2018-05-28 10:59   ` Javier Gonzalez
  2018-05-29 13:15     ` Konopko, Igor J
  0 siblings, 1 reply; 37+ messages in thread
From: Javier Gonzalez @ 2018-05-28 10:59 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Jens Axboe, linux-block, LKML, Konopko, Igor J, Marcin Dziegielewski

> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> From: Igor Konopko <igor.j.konopko@intel.com>
> 
> When all the blocks (chunks) in line are marked as bad (offline)
> we shouldn't try to read smeta during init process.
> 
> Currently we are trying to do so by passing -1 as PPA address,
> what causes multiple warnings, that we issuing IOs to out-of-bound
> PPAs.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> Updated title.
> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
> ---
> drivers/lightnvm/pblk-core.c | 5 +++++
> 1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index a20b41c355c5..e3e883547198 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -868,6 +868,11 @@ int pblk_line_read_smeta(struct pblk *pblk, struct pblk_line *line)
> {
> 	u64 bpaddr = pblk_line_smeta_start(pblk, line);
> 
> +	if (bpaddr == -1) {
> +		/* Whole line is bad - do not try to read smeta. */
> +		return 1;
> +	}

This case cannot occur on the only user of the function
(pblk_recov_l2p()). On the previous check (pblk_line_was_written()), we
verify the state of the line and the position of the first good chunk. In
the case of a bad line (less chunks than a given threshold to allow
emeta), the recovery will not be carried out in the line.

Javier

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-05-28  8:58 ` [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0 Matias Bjørling
@ 2018-05-28 11:02   ` Javier Gonzalez
  2018-06-04 10:09     ` Dziegielewski, Marcin
  0 siblings, 1 reply; 37+ messages in thread
From: Javier Gonzalez @ 2018-05-28 11:02 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Jens Axboe, linux-block, linux-kernel, Marcin Dziegielewski,
	Konopko, Igor J

[-- Attachment #1: Type: text/plain, Size: 2564 bytes --]

> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> 
> Some devices can expose mw_cunits equal to 0, it can cause creation
> of too small write buffer and cause performance to drop on write
> workloads.
> 
> To handle that, we use the default value for MLC and beacause it
> covers both 1.2 and 2.0 OC specification, setting up mw_cunits
> in nvme_nvm_setup_12 function isn't longer necessary.
> 
> Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
> ---
> drivers/lightnvm/pblk-init.c | 10 +++++++++-
> drivers/nvme/host/lightnvm.c |  1 -
> 2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index d65d2f972ccf..0f277744266b 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
> 	atomic64_set(&pblk->nr_flush, 0);
> 	pblk->nr_flush_rst = 0;
> 
> -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> +	if (geo->mw_cunits) {
> +		pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> +	} else {
> +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
> +		/*
> +		 * Some devices can expose mw_cunits equal to 0, so let's use
> +		 * here default safe value for MLC.
> +		 */
> +	}
> 
> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns;
> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> index 41279da799ed..c747792da915 100644
> --- a/drivers/nvme/host/lightnvm.c
> +++ b/drivers/nvme/host/lightnvm.c
> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct nvme_nvm_id12 *id,
> 
> 	geo->ws_min = sec_per_pg;
> 	geo->ws_opt = sec_per_pg;
> -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC safe values */
> 
> 	/* Do not impose values for maximum number of open blocks as it is
> 	 * unspecified in 1.2. Users of 1.2 must be aware of this and eventually
> --
> 2.11.0

By doing this, 1.2 future users (beyond pblk), will fail to have a valid
mw_cunits value. It's ok to deal with the 0 case in pblk, but I believe
that we should have the default value for 1.2 either way.

A more generic way of doing this would be to have a default value for
2.0 too, in case mw_cunits is reported as 0.

Javier


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC
  2018-05-28 10:51   ` Javier Gonzalez
@ 2018-05-29 13:07     ` Konopko, Igor J
  2018-05-29 17:58       ` Javier Gonzalez
  0 siblings, 1 reply; 37+ messages in thread
From: Konopko, Igor J @ 2018-05-29 13:07 UTC (permalink / raw)
  To: Javier Gonzalez, Matias Bjorling
  Cc: Jens Axboe, linux-block, linux-kernel, Dziegielewski, Marcin

> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> Do you mean random writes? On fully sequential, a line will either be
> fully written, fully invalidated or on its way to be written. When
> invalidating the line, then the whole line will be invalidated and GC
> will free it without having to move valid data.

I meant sequential writes, since this is the easiest way to reach rl->rb_state = PBLK_RL_LOW. When we updating this values inside __pblk_rl_update_rates(), most of the times rl->rb_user_cnt will became equal to rl->rb_user_max - which means that we cannot handle any more user IOs. In that case pblk_rl_user_may_insert() will return false, no one will call pblk_rb_update_l2p(), rl->rb_user_cnt will not be decreased, so user IOs will stuck for forever.

> I can see why you might have problems with very low OP due to the rate
> limiter, but unfortunately this is not a good way of solving the
> problem. When you do this, you basically make the L2P to point to the
> device instead of pointing to the write cache, which in essence bypasses
> mw_cuints. As a result, if a read comes in to one of the synced entries,
> it will violate the device-host contract and most probably fail (for
> sure fail on > SLC).

What about using on that path some modified version of pblk_rb_sync_l2p() which will synchronize all the RB entries except last mw_cunits number of elements?

Also you wrote about mw_cuints definitely makes sense, but even without my changes I believe that we can lead into such a situation - especially for pblk with small number of LUNs assigned under  write IOs with high sector count. Pblk_rb_update_l2p() does not explicitly takes mw_cunints this value into consideration right now.

> I think that the right way of solving this problem is separating the
> write and GC buffers and then assigning tokens to them. The write thread
> will then consume both buffers based on these tokens. In this case, user
> I/O will have a buffer for itself, which will be guaranteed to advance
> at the rate the rate-limiter is allowing it to. Note that the 2 buffers
> can be a single buffer with a new set of pointers so that the lookup can
> be done with a single bit.
> 
> I have been looking for time to implement this for a while. If you want
> to give it a go, we can talk and I can give you some pointers on
> potential issues I have thought about.

I believe this is interesting option - we can discuss this for one of next releases.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [GIT PULL 16/20] lightnvm: error handling when whole line is bad
  2018-05-28 10:59   ` Javier Gonzalez
@ 2018-05-29 13:15     ` Konopko, Igor J
  2018-05-29 18:29       ` Javier Gonzalez
  0 siblings, 1 reply; 37+ messages in thread
From: Konopko, Igor J @ 2018-05-29 13:15 UTC (permalink / raw)
  To: Javier Gonzalez, Matias Bjorling
  Cc: Jens Axboe, linux-block, LKML, Dziegielewski, Marcin

> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> This case cannot occur on the only user of the function
> (pblk_recov_l2p()). On the previous check (pblk_line_was_written()), we
> verify the state of the line and the position of the first good chunk. In
> the case of a bad line (less chunks than a given threshold to allow
> emeta), the recovery will not be carried out in the line.
You are right. It looks that during my testing there was some inconsistency between chunks state table which is verified inside pblk_line_was_written() and blk_bitmap which was read from emeta and is verified in pblk_line_smeta_start(). I'm living decision to maintainers whether we should keep this sanity check or not - it really just pass gracefully the result from pblk_line_smeta_start() where similar sanity check is present.

Igor

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC
  2018-05-29 13:07     ` Konopko, Igor J
@ 2018-05-29 17:58       ` Javier Gonzalez
  0 siblings, 0 replies; 37+ messages in thread
From: Javier Gonzalez @ 2018-05-29 17:58 UTC (permalink / raw)
  To: Konopko, Igor J
  Cc: Matias Bjørling, Jens Axboe, linux-block, linux-kernel,
	Dziegielewski, Marcin

[-- Attachment #1: Type: text/plain, Size: 3626 bytes --]


> On 29 May 2018, at 15.07, Konopko, Igor J <igor.j.konopko@intel.com> wrote:
> 
>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>> Do you mean random writes? On fully sequential, a line will either be
>> fully written, fully invalidated or on its way to be written. When
>> invalidating the line, then the whole line will be invalidated and GC
>> will free it without having to move valid data.
> 
> I meant sequential writes, since this is the easiest way to reach
> rl->rb_state = PBLK_RL_LOW. When we updating this values inside
> __pblk_rl_update_rates(), most of the times rl->rb_user_cnt will
> became equal to rl->rb_user_max - which means that we cannot handle
> any more user IOs. In that case pblk_rl_user_may_insert() will return
> false, no one will call pblk_rb_update_l2p(), rl->rb_user_cnt will not
> be decreased, so user IOs will stuck for forever.
> 

What OP are you using? Even with a 5%, full lines should start being
freed as they are completely invalid. But fair enough, if you decide to
optimize for a guaranteed sequential workload where the OP is only to
support grown bad blocks, you will run into this issue.

>> I can see why you might have problems with very low OP due to the rate
>> limiter, but unfortunately this is not a good way of solving the
>> problem. When you do this, you basically make the L2P to point to the
>> device instead of pointing to the write cache, which in essence bypasses
>> mw_cuints. As a result, if a read comes in to one of the synced entries,
>> it will violate the device-host contract and most probably fail (for
>> sure fail on > SLC).
> 
> What about using on that path some modified version of
> pblk_rb_sync_l2p() which will synchronize all the RB entries except
> last mw_cunits number of elements?
> 

Typically, the size of the buffer is the closest upper power of two to
nr_luns * mw_cuints. So in essence, we only update the L2P as we wrap
up. You could go down to match nr_luns * mw_cuints exactly, which
depending on the media geometry can gain you some extra user entries.

> Also you wrote about mw_cuints definitely makes sense, but even
> without my changes I believe that we can lead into such a situation -
> especially for pblk with small number of LUNs assigned under write IOs
> with high sector count. Pblk_rb_update_l2p() does not explicitly takes
> mw_cunints this value into consideration right now.
> 

As mentioned, the size of the buffer is always over nr_luns * mw_cuints
and we only update when we wrap up - this is, when the mem pointer
(which writes new data to the write buffer), wraps and starts writing
new data. So this case is guaranteed not to happen. In fact, this
particular case is what inspired the design of the write buffer and the
current 5 pointers to extend the typical head and tail.

>> I think that the right way of solving this problem is separating the
>> write and GC buffers and then assigning tokens to them. The write thread
>> will then consume both buffers based on these tokens. In this case, user
>> I/O will have a buffer for itself, which will be guaranteed to advance
>> at the rate the rate-limiter is allowing it to. Note that the 2 buffers
>> can be a single buffer with a new set of pointers so that the lookup can
>> be done with a single bit.
>> 
>> I have been looking for time to implement this for a while. If you want
>> to give it a go, we can talk and I can give you some pointers on
>> potential issues I have thought about.
> 
> I believe this is interesting option - we can discuss this for one of
> next releases.

Sure. Ping me if you want to discuss in more detail.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [GIT PULL 16/20] lightnvm: error handling when whole line is bad
  2018-05-29 13:15     ` Konopko, Igor J
@ 2018-05-29 18:29       ` Javier Gonzalez
  0 siblings, 0 replies; 37+ messages in thread
From: Javier Gonzalez @ 2018-05-29 18:29 UTC (permalink / raw)
  To: Konopko, Igor J
  Cc: Matias Bjørling, Jens Axboe, linux-block, LKML,
	Dziegielewski, Marcin

[-- Attachment #1: Type: text/plain, Size: 1150 bytes --]

> On 29 May 2018, at 15.15, Konopko, Igor J <igor.j.konopko@intel.com> wrote:
> 
>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>> This case cannot occur on the only user of the function
>> (pblk_recov_l2p()). On the previous check (pblk_line_was_written()), we
>> verify the state of the line and the position of the first good chunk. In
>> the case of a bad line (less chunks than a given threshold to allow
>> emeta), the recovery will not be carried out in the line.
> You are right. It looks that during my testing there was some
> inconsistency between chunks state table which is verified inside
> pblk_line_was_written() and blk_bitmap which was read from emeta and
> is verified in pblk_line_smeta_start(). I'm living decision to
> maintainers whether we should keep this sanity check or not - it
> really just pass gracefully the result from pblk_line_smeta_start()
> where similar sanity check is present.
> 

Let's avoid an extra check now that there is no users for it in the
current flow. If we have a new use for pblk_line_smeta_start() on a flow
that does cannot offer the same guarantees, we can add it at that point.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [GIT PULL 00/20] lightnvm updates for 4.18
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (19 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC Matias Bjørling
@ 2018-06-01 10:45 ` Matias Bjørling
  2018-06-01 12:36   ` Jens Axboe
  20 siblings, 1 reply; 37+ messages in thread
From: Matias Bjørling @ 2018-06-01 10:45 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González, Konopko, Igor J,
	Marcin Dziegielewski

On 05/28/2018 10:58 AM, Matias Bjørling wrote:
> Hi Jens,
> 
> Please pick up the following patches.
> 
>   - Hans reworked the write error recovery path in pblk.
>   - Igor added extra error handling for lines, and fixed a bug in the
>     pblk ringbuffer during GC.
>   - Javier refactored the pblk code a bit, added extra error
>     handling, and added checks to verify that data returned from drive
>     is appropriate.
>   - Marcin added some extra logic to manage the write buffer. Now
>     MW_CUNITS can be zero and the size of write buffer can be changed
>     at module load time.
> 
> Thanks,
> Matias
> 
> Hans Holmberg (3):
>    lightnvm: pblk: rework write error recovery path
>    lightnvm: pblk: garbage collect lines with failed writes
>    lightnvm: pblk: fix smeta write error path
> 
> Igor Konopko (4):
>    lightnvm: proper error handling for pblk_bio_add_pages
>    lightnvm: error handling when whole line is bad
>    lightnvm: fix partial read error path
>    lightnvm: pblk: sync RB and RL states during GC
> 
> Javier González (11):
>    lightnvm: pblk: fail gracefully on line alloc. failure
>    lightnvm: pblk: recheck for bad lines at runtime
>    lightnvm: pblk: check read lba on gc path
>    lightnvn: pblk: improve error msg on corrupted LBAs
>    lightnvm: pblk: warn in case of corrupted write buffer
>    lightnvm: pblk: return NVM_ error on failed submission
>    lightnvm: pblk: remove unnecessary indirection
>    lightnvm: pblk: remove unnecessary argument
>    lightnvm: pblk: check for chunk size before allocating it
>    lightnvn: pass flag on graceful teardown to targets
>    lightnvm: pblk: remove dead function
> 
> Marcin Dziegielewski (2):
>    lightnvm: pblk: handle case when mw_cunits equals to 0
>    lightnvm: pblk: add possibility to set write buffer size manually
> 
>   drivers/lightnvm/core.c          |  10 +-
>   drivers/lightnvm/pblk-core.c     | 149 +++++++++++++++------
>   drivers/lightnvm/pblk-gc.c       | 112 ++++++++++------
>   drivers/lightnvm/pblk-init.c     | 112 +++++++++++-----
>   drivers/lightnvm/pblk-map.c      |  33 +++--
>   drivers/lightnvm/pblk-rb.c       |  51 +------
>   drivers/lightnvm/pblk-read.c     |  83 +++++++++---
>   drivers/lightnvm/pblk-recovery.c |  91 -------------
>   drivers/lightnvm/pblk-rl.c       |  29 +++-
>   drivers/lightnvm/pblk-sysfs.c    |  15 ++-
>   drivers/lightnvm/pblk-write.c    | 281 +++++++++++++++++++++++++--------------
>   drivers/lightnvm/pblk.h          |  43 +++---
>   drivers/nvme/host/lightnvm.c     |   1 -
>   include/linux/lightnvm.h         |   2 +-
>   14 files changed, 604 insertions(+), 408 deletions(-)
> 

Hi Jens,

Javier had some comments to 16, 18, and 20. The rest is ready to go. 
Would you like me to resend the patches?

Thank you!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [GIT PULL 00/20] lightnvm updates for 4.18
  2018-06-01 10:45 ` [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
@ 2018-06-01 12:36   ` Jens Axboe
  0 siblings, 0 replies; 37+ messages in thread
From: Jens Axboe @ 2018-06-01 12:36 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: axboe, linux-block, linux-kernel, Javier González, Konopko,
	Igor J, Marcin Dziegielewski

On Jun 1, 2018, at 4:45 AM, Matias Bjørling <mb@lightnvm.io> wrote:
> 
>> On 05/28/2018 10:58 AM, Matias Bjørling wrote:
>> Hi Jens,
>> Please pick up the following patches.
>>  - Hans reworked the write error recovery path in pblk.
>>  - Igor added extra error handling for lines, and fixed a bug in the
>>    pblk ringbuffer during GC.
>>  - Javier refactored the pblk code a bit, added extra error
>>    handling, and added checks to verify that data returned from drive
>>    is appropriate.
>>  - Marcin added some extra logic to manage the write buffer. Now
>>    MW_CUNITS can be zero and the size of write buffer can be changed
>>    at module load time.
>> Thanks,
>> Matias
>> Hans Holmberg (3):
>>   lightnvm: pblk: rework write error recovery path
>>   lightnvm: pblk: garbage collect lines with failed writes
>>   lightnvm: pblk: fix smeta write error path
>> Igor Konopko (4):
>>   lightnvm: proper error handling for pblk_bio_add_pages
>>   lightnvm: error handling when whole line is bad
>>   lightnvm: fix partial read error path
>>   lightnvm: pblk: sync RB and RL states during GC
>> Javier González (11):
>>   lightnvm: pblk: fail gracefully on line alloc. failure
>>   lightnvm: pblk: recheck for bad lines at runtime
>>   lightnvm: pblk: check read lba on gc path
>>   lightnvn: pblk: improve error msg on corrupted LBAs
>>   lightnvm: pblk: warn in case of corrupted write buffer
>>   lightnvm: pblk: return NVM_ error on failed submission
>>   lightnvm: pblk: remove unnecessary indirection
>>   lightnvm: pblk: remove unnecessary argument
>>   lightnvm: pblk: check for chunk size before allocating it
>>   lightnvn: pass flag on graceful teardown to targets
>>   lightnvm: pblk: remove dead function
>> Marcin Dziegielewski (2):
>>   lightnvm: pblk: handle case when mw_cunits equals to 0
>>   lightnvm: pblk: add possibility to set write buffer size manually
>>  drivers/lightnvm/core.c          |  10 +-
>>  drivers/lightnvm/pblk-core.c     | 149 +++++++++++++++------
>>  drivers/lightnvm/pblk-gc.c       | 112 ++++++++++------
>>  drivers/lightnvm/pblk-init.c     | 112 +++++++++++-----
>>  drivers/lightnvm/pblk-map.c      |  33 +++--
>>  drivers/lightnvm/pblk-rb.c       |  51 +------
>>  drivers/lightnvm/pblk-read.c     |  83 +++++++++---
>>  drivers/lightnvm/pblk-recovery.c |  91 -------------
>>  drivers/lightnvm/pblk-rl.c       |  29 +++-
>>  drivers/lightnvm/pblk-sysfs.c    |  15 ++-
>>  drivers/lightnvm/pblk-write.c    | 281 +++++++++++++++++++++++++--------------
>>  drivers/lightnvm/pblk.h          |  43 +++---
>>  drivers/nvme/host/lightnvm.c     |   1 -
>>  include/linux/lightnvm.h         |   2 +-
>>  14 files changed, 604 insertions(+), 408 deletions(-)
> 
> Hi Jens,
> 
> Javier had some comments to 16, 18, and 20. The rest is ready to go. Would you like me to resend the patches?

Yes please. 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-05-28 11:02   ` Javier Gonzalez
@ 2018-06-04 10:09     ` Dziegielewski, Marcin
  2018-06-04 10:21       ` Javier Gonzalez
  0 siblings, 1 reply; 37+ messages in thread
From: Dziegielewski, Marcin @ 2018-06-04 10:09 UTC (permalink / raw)
  To: Javier Gonzalez, Matias Bjorling
  Cc: Jens Axboe, linux-block, linux-kernel, Konopko, Igor J


Frist of all I want to say sorry for late response - I was on holiday.

> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> Sent: Monday, May 28, 2018 1:03 PM
> To: Matias Bjørling <mb@lightnvm.io>
> Cc: Jens Axboe <axboe@fb.com>; linux-block@vger.kernel.org; linux-
> kernel@vger.kernel.org; Dziegielewski, Marcin
> <marcin.dziegielewski@intel.com>; Konopko, Igor J
> <igor.j.konopko@intel.com>
> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
> equals to 0
> 
> > On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
> >
> > From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> >
> > Some devices can expose mw_cunits equal to 0, it can cause creation of
> > too small write buffer and cause performance to drop on write
> > workloads.
> >
> > To handle that, we use the default value for MLC and beacause it
> > covers both 1.2 and 2.0 OC specification, setting up mw_cunits in
> > nvme_nvm_setup_12 function isn't longer necessary.
> >
> > Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> > Signed-off-by: Matias Bjørling <mb@lightnvm.io>
> > ---
> > drivers/lightnvm/pblk-init.c | 10 +++++++++-
> > drivers/nvme/host/lightnvm.c |  1 -
> > 2 files changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/lightnvm/pblk-init.c
> > b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b 100644
> > --- a/drivers/lightnvm/pblk-init.c
> > +++ b/drivers/lightnvm/pblk-init.c
> > @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
> > 	atomic64_set(&pblk->nr_flush, 0);
> > 	pblk->nr_flush_rst = 0;
> >
> > -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> > +	if (geo->mw_cunits) {
> > +		pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> > +	} else {
> > +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
> > +		/*
> > +		 * Some devices can expose mw_cunits equal to 0, so let's
> use
> > +		 * here default safe value for MLC.
> > +		 */
> > +	}
> >
> > 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
> > 	max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git
> > a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index
> > 41279da799ed..c747792da915 100644
> > --- a/drivers/nvme/host/lightnvm.c
> > +++ b/drivers/nvme/host/lightnvm.c
> > @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
> nvme_nvm_id12
> > *id,
> >
> > 	geo->ws_min = sec_per_pg;
> > 	geo->ws_opt = sec_per_pg;
> > -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC safe values
> */
> >
> > 	/* Do not impose values for maximum number of open blocks as it is
> > 	 * unspecified in 1.2. Users of 1.2 must be aware of this and
> > eventually
> > --
> > 2.11.0
> 
> By doing this, 1.2 future users (beyond pblk), will fail to have a valid
> mw_cunits value. It's ok to deal with the 0 case in pblk, but I believe that we
> should have the default value for 1.2 either way.

I'm not sure. From my understanding, setting of default value was workaround for pblk case, am I right ?. In my opinion any user of 1.2 spec should be aware that there is not mw_cunit value. From my point of view, leaving here 0 (and decision what do with it to lightnvm user) is more safer way, but maybe I'm wrong. I believe that it is topic to wider discussion with maintainers.

> 
> A more generic way of doing this would be to have a default value for
> 2.0 too, in case mw_cunits is reported as 0.

Since 0 is correct value and users can make different decisions based on it, I think we shouldn't overwrite it by default value. Is it make sense?
> 
> Javier

Thanks,
Marcin 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-06-04 10:09     ` Dziegielewski, Marcin
@ 2018-06-04 10:21       ` Javier Gonzalez
  2018-06-04 11:11         ` Dziegielewski, Marcin
  0 siblings, 1 reply; 37+ messages in thread
From: Javier Gonzalez @ 2018-06-04 10:21 UTC (permalink / raw)
  To: Dziegielewski, Marcin
  Cc: Matias Bjørling, Jens Axboe, linux-block, linux-kernel,
	Konopko, Igor J

> On 4 Jun 2018, at 12.09, Dziegielewski, Marcin <marcin.dziegielewski@intel.com> wrote:
> 
> 
> Frist of all I want to say sorry for late response - I was on holiday.
> 
>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>> Sent: Monday, May 28, 2018 1:03 PM
>> To: Matias Bjørling <mb@lightnvm.io>
>> Cc: Jens Axboe <axboe@fb.com>; linux-block@vger.kernel.org; linux-
>> kernel@vger.kernel.org; Dziegielewski, Marcin
>> <marcin.dziegielewski@intel.com>; Konopko, Igor J
>> <igor.j.konopko@intel.com>
>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
>> equals to 0
>> 
>>> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
>>> 
>>> From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
>>> 
>>> Some devices can expose mw_cunits equal to 0, it can cause creation of
>>> too small write buffer and cause performance to drop on write
>>> workloads.
>>> 
>>> To handle that, we use the default value for MLC and beacause it
>>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits in
>>> nvme_nvm_setup_12 function isn't longer necessary.
>>> 
>>> Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
>>> ---
>>> drivers/lightnvm/pblk-init.c | 10 +++++++++-
>>> drivers/nvme/host/lightnvm.c |  1 -
>>> 2 files changed, 9 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/drivers/lightnvm/pblk-init.c
>>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b 100644
>>> --- a/drivers/lightnvm/pblk-init.c
>>> +++ b/drivers/lightnvm/pblk-init.c
>>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
>>> 	atomic64_set(&pblk->nr_flush, 0);
>>> 	pblk->nr_flush_rst = 0;
>>> 
>>> -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
>>> +	if (geo->mw_cunits) {
>>> +		pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
>>> +	} else {
>>> +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
>>> +		/*
>>> +		 * Some devices can expose mw_cunits equal to 0, so let's
>> use
>>> +		 * here default safe value for MLC.
>>> +		 */
>>> +	}
>>> 
>>> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
>>> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git
>>> a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index
>>> 41279da799ed..c747792da915 100644
>>> --- a/drivers/nvme/host/lightnvm.c
>>> +++ b/drivers/nvme/host/lightnvm.c
>>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
>> nvme_nvm_id12
>>> *id,
>>> 
>>> 	geo->ws_min = sec_per_pg;
>>> 	geo->ws_opt = sec_per_pg;
>>> -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC safe values
>> */
>>> /* Do not impose values for maximum number of open blocks as it is
>>> 	 * unspecified in 1.2. Users of 1.2 must be aware of this and
>>> eventually
>>> --
>>> 2.11.0
>> 
>> By doing this, 1.2 future users (beyond pblk), will fail to have a valid
>> mw_cunits value. It's ok to deal with the 0 case in pblk, but I believe that we
>> should have the default value for 1.2 either way.
> 
> I'm not sure. From my understanding, setting of default value was
> workaround for pblk case, am I right ?.

The default value covers the MLC case directly at the lightnvm layer, as
opposed to doing it directly in pblk. Since pblk is the only user now,
you can argue that all changes in the lightnvm layer are to solve pblk
issues, but the idea is that the geometry should be generic.

> In my opinion any user of 1.2
> spec should be aware that there is not mw_cunit value. From my point
> of view, leaving here 0 (and decision what do with it to lightnvm
> user) is more safer way, but maybe I'm wrong. I believe that it is
> topic to wider discussion with maintainers.
> 

1.2 and 2.0 have different geometries, but when we designed the common
nvm_geo structure, the idea was to abstract both specs and allow the
upper layers to use the geometry transparently. 

Specifically in pblk, I would prefer to keep it in such a way that we don't
need to media specific policies (e.g., set default values for MLC
memories), as a general design principle. We already do some geometry
version checks to avoid dereferencing unnecessary pointers on the fast
path, which I would eventually like to remove.

>> A more generic way of doing this would be to have a default value for
>> 2.0 too, in case mw_cunits is reported as 0.
> 
> Since 0 is correct value and users can make different decisions based
> on it, I think we shouldn't overwrite it by default value. Is it make
> sense?

Here I meant at a pblk level - I should have specified it. At the
geometry level, we should not change it. 

The case I am thinking is if mw_cuints repoints 0, but ws_min > 0. In
this case, we still need a host side buffer to serve < ws_min I/Os, even
though the device does not require the buffer to guarantee reads.

>> Javier
> 
> Thanks,
> Marcin 

Javier

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-06-04 10:21       ` Javier Gonzalez
@ 2018-06-04 11:11         ` Dziegielewski, Marcin
  2018-06-04 11:15           ` Javier Gonzalez
  0 siblings, 1 reply; 37+ messages in thread
From: Dziegielewski, Marcin @ 2018-06-04 11:11 UTC (permalink / raw)
  To: Javier Gonzalez
  Cc: Matias Bjorling, Jens Axboe, linux-block, linux-kernel, Konopko, Igor J


> -----Original Message-----
> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> Sent: Monday, June 4, 2018 12:22 PM
> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>; linux-
> block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko, Igor J
> <igor.j.konopko@intel.com>
> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
> equals to 0
> 
> > On 4 Jun 2018, at 12.09, Dziegielewski, Marcin
> <marcin.dziegielewski@intel.com> wrote:
> >
> >
> > Frist of all I want to say sorry for late response - I was on holiday.
> >
> >> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> >> Sent: Monday, May 28, 2018 1:03 PM
> >> To: Matias Bjørling <mb@lightnvm.io>
> >> Cc: Jens Axboe <axboe@fb.com>; linux-block@vger.kernel.org; linux-
> >> kernel@vger.kernel.org; Dziegielewski, Marcin
> >> <marcin.dziegielewski@intel.com>; Konopko, Igor J
> >> <igor.j.konopko@intel.com>
> >> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
> >> mw_cunits equals to 0
> >>
> >>> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
> >>>
> >>> From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> >>>
> >>> Some devices can expose mw_cunits equal to 0, it can cause creation
> >>> of too small write buffer and cause performance to drop on write
> >>> workloads.
> >>>
> >>> To handle that, we use the default value for MLC and beacause it
> >>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits in
> >>> nvme_nvm_setup_12 function isn't longer necessary.
> >>>
> >>> Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> >>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
> >>> ---
> >>> drivers/lightnvm/pblk-init.c | 10 +++++++++-
> >>> drivers/nvme/host/lightnvm.c |  1 -
> >>> 2 files changed, 9 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/lightnvm/pblk-init.c
> >>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b
> >>> 100644
> >>> --- a/drivers/lightnvm/pblk-init.c
> >>> +++ b/drivers/lightnvm/pblk-init.c
> >>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
> >>> 	atomic64_set(&pblk->nr_flush, 0);
> >>> 	pblk->nr_flush_rst = 0;
> >>>
> >>> -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> >>> +	if (geo->mw_cunits) {
> >>> +		pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> >>> +	} else {
> >>> +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
> >>> +		/*
> >>> +		 * Some devices can expose mw_cunits equal to 0, so let's
> >> use
> >>> +		 * here default safe value for MLC.
> >>> +		 */
> >>> +	}
> >>>
> >>> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
> >>> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git
> >>> a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index
> >>> 41279da799ed..c747792da915 100644
> >>> --- a/drivers/nvme/host/lightnvm.c
> >>> +++ b/drivers/nvme/host/lightnvm.c
> >>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
> >> nvme_nvm_id12
> >>> *id,
> >>>
> >>> 	geo->ws_min = sec_per_pg;
> >>> 	geo->ws_opt = sec_per_pg;
> >>> -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC safe values
> >> */
> >>> /* Do not impose values for maximum number of open blocks as it is
> >>> 	 * unspecified in 1.2. Users of 1.2 must be aware of this and
> >>> eventually
> >>> --
> >>> 2.11.0
> >>
> >> By doing this, 1.2 future users (beyond pblk), will fail to have a
> >> valid mw_cunits value. It's ok to deal with the 0 case in pblk, but I
> >> believe that we should have the default value for 1.2 either way.
> >
> > I'm not sure. From my understanding, setting of default value was
> > workaround for pblk case, am I right ?.
> 
> The default value covers the MLC case directly at the lightnvm layer, as
> opposed to doing it directly in pblk. Since pblk is the only user now, you can
> argue that all changes in the lightnvm layer are to solve pblk issues, but the
> idea is that the geometry should be generic.
> 
> > In my opinion any user of 1.2
> > spec should be aware that there is not mw_cunit value. From my point
> > of view, leaving here 0 (and decision what do with it to lightnvm
> > user) is more safer way, but maybe I'm wrong. I believe that it is
> > topic to wider discussion with maintainers.
> >
> 
> 1.2 and 2.0 have different geometries, but when we designed the common
> nvm_geo structure, the idea was to abstract both specs and allow the upper
> layers to use the geometry transparently.
> 
> Specifically in pblk, I would prefer to keep it in such a way that we don't need
> to media specific policies (e.g., set default values for MLC memories), as a
> general design principle. We already do some geometry version checks to
> avoid dereferencing unnecessary pointers on the fast path, which I would
> eventually like to remove.
> 

Ok, now I understand your point of view and agree with that, I will prepare second version of this patch without this change. Thanks for the clarification.

> >> A more generic way of doing this would be to have a default value for
> >> 2.0 too, in case mw_cunits is reported as 0.
> >
> > Since 0 is correct value and users can make different decisions based
> > on it, I think we shouldn't overwrite it by default value. Is it make
> > sense?
> 
> Here I meant at a pblk level - I should have specified it. At the geometry
> level, we should not change it.
> 
> The case I am thinking is if mw_cuints repoints 0, but ws_min > 0. In this case,
> we still need a host side buffer to serve < ws_min I/Os, even though the
> device does not require the buffer to guarantee reads.

Oh, ok now we are on the same page. In this patch I was trying to address such case. Do you have other idea how to do it or here are you thinking only on value of default variable?

> 
> >> Javier
> >
> > Thanks,
> > Marcin
> 
> Javier
Thanks,
Marcin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-06-04 11:11         ` Dziegielewski, Marcin
@ 2018-06-04 11:15           ` Javier Gonzalez
  2018-06-04 17:17             ` Dziegielewski, Marcin
  0 siblings, 1 reply; 37+ messages in thread
From: Javier Gonzalez @ 2018-06-04 11:15 UTC (permalink / raw)
  To: Dziegielewski, Marcin
  Cc: Matias Bjørling, Jens Axboe, linux-block, linux-kernel,
	Konopko, Igor J


> On 4 Jun 2018, at 13.11, Dziegielewski, Marcin <marcin.dziegielewski@intel.com> wrote:
> 
>> -----Original Message-----
>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>> Sent: Monday, June 4, 2018 12:22 PM
>> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
>> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>; linux-
>> block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko, Igor J
>> <igor.j.konopko@intel.com>
>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
>> equals to 0
>> 
>>> On 4 Jun 2018, at 12.09, Dziegielewski, Marcin
>> <marcin.dziegielewski@intel.com> wrote:
>>> Frist of all I want to say sorry for late response - I was on holiday.
>>> 
>>>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>>>> Sent: Monday, May 28, 2018 1:03 PM
>>>> To: Matias Bjørling <mb@lightnvm.io>
>>>> Cc: Jens Axboe <axboe@fb.com>; linux-block@vger.kernel.org; linux-
>>>> kernel@vger.kernel.org; Dziegielewski, Marcin
>>>> <marcin.dziegielewski@intel.com>; Konopko, Igor J
>>>> <igor.j.konopko@intel.com>
>>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
>>>> mw_cunits equals to 0
>>>> 
>>>>> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
>>>>> 
>>>>> From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
>>>>> 
>>>>> Some devices can expose mw_cunits equal to 0, it can cause creation
>>>>> of too small write buffer and cause performance to drop on write
>>>>> workloads.
>>>>> 
>>>>> To handle that, we use the default value for MLC and beacause it
>>>>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits in
>>>>> nvme_nvm_setup_12 function isn't longer necessary.
>>>>> 
>>>>> Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
>>>>> ---
>>>>> drivers/lightnvm/pblk-init.c | 10 +++++++++-
>>>>> drivers/nvme/host/lightnvm.c |  1 -
>>>>> 2 files changed, 9 insertions(+), 2 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/lightnvm/pblk-init.c
>>>>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b
>>>>> 100644
>>>>> --- a/drivers/lightnvm/pblk-init.c
>>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
>>>>> 	atomic64_set(&pblk->nr_flush, 0);
>>>>> 	pblk->nr_flush_rst = 0;
>>>>> 
>>>>> -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
>>>>> +	if (geo->mw_cunits) {
>>>>> +		pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
>>>>> +	} else {
>>>>> +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
>>>>> +		/*
>>>>> +		 * Some devices can expose mw_cunits equal to 0, so let's
>>>> use
>>>>> +		 * here default safe value for MLC.
>>>>> +		 */
>>>>> +	}
>>>>> 
>>>>> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
>>>>> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git
>>>>> a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index
>>>>> 41279da799ed..c747792da915 100644
>>>>> --- a/drivers/nvme/host/lightnvm.c
>>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
>>>> nvme_nvm_id12
>>>>> *id,
>>>>> 
>>>>> 	geo->ws_min = sec_per_pg;
>>>>> 	geo->ws_opt = sec_per_pg;
>>>>> -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC safe values
>>>> */
>>>>> /* Do not impose values for maximum number of open blocks as it is
>>>>> 	 * unspecified in 1.2. Users of 1.2 must be aware of this and
>>>>> eventually
>>>>> --
>>>>> 2.11.0
>>>> 
>>>> By doing this, 1.2 future users (beyond pblk), will fail to have a
>>>> valid mw_cunits value. It's ok to deal with the 0 case in pblk, but I
>>>> believe that we should have the default value for 1.2 either way.
>>> 
>>> I'm not sure. From my understanding, setting of default value was
>>> workaround for pblk case, am I right ?.
>> 
>> The default value covers the MLC case directly at the lightnvm layer, as
>> opposed to doing it directly in pblk. Since pblk is the only user now, you can
>> argue that all changes in the lightnvm layer are to solve pblk issues, but the
>> idea is that the geometry should be generic.
>> 
>>> In my opinion any user of 1.2
>>> spec should be aware that there is not mw_cunit value. From my point
>>> of view, leaving here 0 (and decision what do with it to lightnvm
>>> user) is more safer way, but maybe I'm wrong. I believe that it is
>>> topic to wider discussion with maintainers.
>> 
>> 1.2 and 2.0 have different geometries, but when we designed the common
>> nvm_geo structure, the idea was to abstract both specs and allow the upper
>> layers to use the geometry transparently.
>> 
>> Specifically in pblk, I would prefer to keep it in such a way that we don't need
>> to media specific policies (e.g., set default values for MLC memories), as a
>> general design principle. We already do some geometry version checks to
>> avoid dereferencing unnecessary pointers on the fast path, which I would
>> eventually like to remove.
> 
> Ok, now I understand your point of view and agree with that, I will
> prepare second version of this patch without this change.

Sounds good.

> Thanks for
> the clarification.
> 

Sure :)

>>>> A more generic way of doing this would be to have a default value for
>>>> 2.0 too, in case mw_cunits is reported as 0.
>>> 
>>> Since 0 is correct value and users can make different decisions based
>>> on it, I think we shouldn't overwrite it by default value. Is it make
>>> sense?
>> 
>> Here I meant at a pblk level - I should have specified it. At the geometry
>> level, we should not change it.
>> 
>> The case I am thinking is if mw_cuints repoints 0, but ws_min > 0. In this case,
>> we still need a host side buffer to serve < ws_min I/Os, even though the
>> device does not require the buffer to guarantee reads.
> 
> Oh, ok now we are on the same page. In this patch I was trying to
> address such case. Do you have other idea how to do it or here are you
> thinking only on value of default variable?

If doing this, I guess that something in the line of what you did with
increasing the size of the write buffer via a module parameter. For
example, checking if the size of the write buffer based on mw_cuints is
enough to cover ws_min, which normally would only be an issue when
mw_cuints == 0 or when the number of PUs used for the pblk instance is
very small and mw_cuints < nr_luns * ws_min.

> 
>>>> Javier
>>> 
>>> Thanks,
>>> Marcin
>> 
>> Javier
> Thanks,
> Marcin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-06-04 11:15           ` Javier Gonzalez
@ 2018-06-04 17:17             ` Dziegielewski, Marcin
  2018-06-05  7:12               ` Javier Gonzalez
  0 siblings, 1 reply; 37+ messages in thread
From: Dziegielewski, Marcin @ 2018-06-04 17:17 UTC (permalink / raw)
  To: Javier Gonzalez
  Cc: Matias Bjorling, Jens Axboe, linux-block, linux-kernel, Konopko, Igor J

> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> Sent: Monday, June 4, 2018 1:16 PM
> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>; linux-
> block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko, Igor J
> <igor.j.konopko@intel.com>
> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
> equals to 0
> 
> 
> > On 4 Jun 2018, at 13.11, Dziegielewski, Marcin
> <marcin.dziegielewski@intel.com> wrote:
> >
> >> -----Original Message-----
> >> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> >> Sent: Monday, June 4, 2018 12:22 PM
> >> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
> >> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>;
> >> linux- block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko,
> >> Igor J <igor.j.konopko@intel.com>
> >> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
> >> mw_cunits equals to 0
> >>
> >>> On 4 Jun 2018, at 12.09, Dziegielewski, Marcin
> >> <marcin.dziegielewski@intel.com> wrote:
> >>> Frist of all I want to say sorry for late response - I was on holiday.
> >>>
> >>>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> >>>> Sent: Monday, May 28, 2018 1:03 PM
> >>>> To: Matias Bjørling <mb@lightnvm.io>
> >>>> Cc: Jens Axboe <axboe@fb.com>; linux-block@vger.kernel.org; linux-
> >>>> kernel@vger.kernel.org; Dziegielewski, Marcin
> >>>> <marcin.dziegielewski@intel.com>; Konopko, Igor J
> >>>> <igor.j.konopko@intel.com>
> >>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
> >>>> mw_cunits equals to 0
> >>>>
> >>>>> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
> >>>>>
> >>>>> From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> >>>>>
> >>>>> Some devices can expose mw_cunits equal to 0, it can cause
> >>>>> creation of too small write buffer and cause performance to drop
> >>>>> on write workloads.
> >>>>>
> >>>>> To handle that, we use the default value for MLC and beacause it
> >>>>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits in
> >>>>> nvme_nvm_setup_12 function isn't longer necessary.
> >>>>>
> >>>>> Signed-off-by: Marcin Dziegielewski
> >>>>> <marcin.dziegielewski@intel.com>
> >>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>>>> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
> >>>>> ---
> >>>>> drivers/lightnvm/pblk-init.c | 10 +++++++++-
> >>>>> drivers/nvme/host/lightnvm.c |  1 -
> >>>>> 2 files changed, 9 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/lightnvm/pblk-init.c
> >>>>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b
> >>>>> 100644
> >>>>> --- a/drivers/lightnvm/pblk-init.c
> >>>>> +++ b/drivers/lightnvm/pblk-init.c
> >>>>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
> >>>>> 	atomic64_set(&pblk->nr_flush, 0);
> >>>>> 	pblk->nr_flush_rst = 0;
> >>>>>
> >>>>> -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> >>>>> +	if (geo->mw_cunits) {
> >>>>> +		pblk->pgs_in_buffer = geo->mw_cunits * geo-
> >all_luns;
> >>>>> +	} else {
> >>>>> +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo-
> >all_luns;
> >>>>> +		/*
> >>>>> +		 * Some devices can expose mw_cunits equal to 0, so
> let's
> >>>> use
> >>>>> +		 * here default safe value for MLC.
> >>>>> +		 */
> >>>>> +	}
> >>>>>
> >>>>> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
> >>>>> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git
> >>>>> a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> >>>>> index
> >>>>> 41279da799ed..c747792da915 100644
> >>>>> --- a/drivers/nvme/host/lightnvm.c
> >>>>> +++ b/drivers/nvme/host/lightnvm.c
> >>>>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
> >>>> nvme_nvm_id12
> >>>>> *id,
> >>>>>
> >>>>> 	geo->ws_min = sec_per_pg;
> >>>>> 	geo->ws_opt = sec_per_pg;
> >>>>> -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC
> safe values
> >>>> */
> >>>>> /* Do not impose values for maximum number of open blocks as it is
> >>>>> 	 * unspecified in 1.2. Users of 1.2 must be aware of this and
> >>>>> eventually
> >>>>> --
> >>>>> 2.11.0
> >>>>
> >>>> By doing this, 1.2 future users (beyond pblk), will fail to have a
> >>>> valid mw_cunits value. It's ok to deal with the 0 case in pblk, but
> >>>> I believe that we should have the default value for 1.2 either way.
> >>>
> >>> I'm not sure. From my understanding, setting of default value was
> >>> workaround for pblk case, am I right ?.
> >>
> >> The default value covers the MLC case directly at the lightnvm layer,
> >> as opposed to doing it directly in pblk. Since pblk is the only user
> >> now, you can argue that all changes in the lightnvm layer are to
> >> solve pblk issues, but the idea is that the geometry should be generic.
> >>
> >>> In my opinion any user of 1.2
> >>> spec should be aware that there is not mw_cunit value. From my point
> >>> of view, leaving here 0 (and decision what do with it to lightnvm
> >>> user) is more safer way, but maybe I'm wrong. I believe that it is
> >>> topic to wider discussion with maintainers.
> >>
> >> 1.2 and 2.0 have different geometries, but when we designed the
> >> common nvm_geo structure, the idea was to abstract both specs and
> >> allow the upper layers to use the geometry transparently.
> >>
> >> Specifically in pblk, I would prefer to keep it in such a way that we
> >> don't need to media specific policies (e.g., set default values for
> >> MLC memories), as a general design principle. We already do some
> >> geometry version checks to avoid dereferencing unnecessary pointers
> >> on the fast path, which I would eventually like to remove.
> >
> > Ok, now I understand your point of view and agree with that, I will
> > prepare second version of this patch without this change.
> 
> Sounds good.
> 
> > Thanks for
> > the clarification.
> >
> 
> Sure :)
> 
> >>>> A more generic way of doing this would be to have a default value
> >>>> for
> >>>> 2.0 too, in case mw_cunits is reported as 0.
> >>>
> >>> Since 0 is correct value and users can make different decisions
> >>> based on it, I think we shouldn't overwrite it by default value. Is
> >>> it make sense?
> >>
> >> Here I meant at a pblk level - I should have specified it. At the
> >> geometry level, we should not change it.
> >>
> >> The case I am thinking is if mw_cuints repoints 0, but ws_min > 0. In
> >> this case, we still need a host side buffer to serve < ws_min I/Os,
> >> even though the device does not require the buffer to guarantee reads.
> >
> > Oh, ok now we are on the same page. In this patch I was trying to
> > address such case. Do you have other idea how to do it or here are you
> > thinking only on value of default variable?
> 
> If doing this, I guess that something in the line of what you did with
> increasing the size of the write buffer via a module parameter. For example,
> checking if the size of the write buffer based on mw_cuints is enough to
> cover ws_min, which normally would only be an issue when mw_cuints == 0
> or when the number of PUs used for the pblk instance is very small and
> mw_cuints < nr_luns * ws_min.


I see here two cases:
- when mw_cunits > 0  buffer size should have number of entries at least  max(mw_cunits, ws_min) * nr_luns and here we are taking care of both cases mw_cunits > ws_min and mw_cunits < ws_min.
- when mw_cunit == 0  buffer size should have number of entries at least  ws_min * nr _luns and we can use the same puseudocode as above.

Do you see any other case? Could you clarify second case mentioned by you or maybe did you mean opposite case? If yes, I believe that above pseudo code will handle such case too.

> 
> >
> >>>> Javier
> >>>
> >>> Thanks,
> >>> Marcin
> >>
> >> Javier
> > Thanks,
> > Marcin
Thanks!,
Marcin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-06-04 17:17             ` Dziegielewski, Marcin
@ 2018-06-05  7:12               ` Javier Gonzalez
  2018-06-05  9:18                 ` Dziegielewski, Marcin
  0 siblings, 1 reply; 37+ messages in thread
From: Javier Gonzalez @ 2018-06-05  7:12 UTC (permalink / raw)
  To: Dziegielewski, Marcin
  Cc: Matias Bjørling, Jens Axboe, linux-block, linux-kernel,
	Konopko, Igor J

> On 4 Jun 2018, at 19.17, Dziegielewski, Marcin <marcin.dziegielewski@intel.com> wrote:
> 
>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>> Sent: Monday, June 4, 2018 1:16 PM
>> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
>> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>; linux-
>> block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko, Igor J
>> <igor.j.konopko@intel.com>
>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
>> equals to 0
>> 
>> 
>>> On 4 Jun 2018, at 13.11, Dziegielewski, Marcin
>> <marcin.dziegielewski@intel.com> wrote:
>>>> -----Original Message-----
>>>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>>>> Sent: Monday, June 4, 2018 12:22 PM
>>>> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
>>>> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>;
>>>> linux- block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko,
>>>> Igor J <igor.j.konopko@intel.com>
>>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
>>>> mw_cunits equals to 0
>>>> 
>>>>> On 4 Jun 2018, at 12.09, Dziegielewski, Marcin
>>>> <marcin.dziegielewski@intel.com> wrote:
>>>>> Frist of all I want to say sorry for late response - I was on holiday.
>>>>> 
>>>>>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>>>>>> Sent: Monday, May 28, 2018 1:03 PM
>>>>>> To: Matias Bjørling <mb@lightnvm.io>
>>>>>> Cc: Jens Axboe <axboe@fb.com>; linux-block@vger.kernel.org; linux-
>>>>>> kernel@vger.kernel.org; Dziegielewski, Marcin
>>>>>> <marcin.dziegielewski@intel.com>; Konopko, Igor J
>>>>>> <igor.j.konopko@intel.com>
>>>>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
>>>>>> mw_cunits equals to 0
>>>>>> 
>>>>>>> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
>>>>>>> 
>>>>>>> From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
>>>>>>> 
>>>>>>> Some devices can expose mw_cunits equal to 0, it can cause
>>>>>>> creation of too small write buffer and cause performance to drop
>>>>>>> on write workloads.
>>>>>>> 
>>>>>>> To handle that, we use the default value for MLC and beacause it
>>>>>>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits in
>>>>>>> nvme_nvm_setup_12 function isn't longer necessary.
>>>>>>> 
>>>>>>> Signed-off-by: Marcin Dziegielewski
>>>>>>> <marcin.dziegielewski@intel.com>
>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>>> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
>>>>>>> ---
>>>>>>> drivers/lightnvm/pblk-init.c | 10 +++++++++-
>>>>>>> drivers/nvme/host/lightnvm.c |  1 -
>>>>>>> 2 files changed, 9 insertions(+), 2 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/drivers/lightnvm/pblk-init.c
>>>>>>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b
>>>>>>> 100644
>>>>>>> --- a/drivers/lightnvm/pblk-init.c
>>>>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>>>>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
>>>>>>> 	atomic64_set(&pblk->nr_flush, 0);
>>>>>>> 	pblk->nr_flush_rst = 0;
>>>>>>> 
>>>>>>> -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
>>>>>>> +	if (geo->mw_cunits) {
>>>>>>> +		pblk->pgs_in_buffer = geo->mw_cunits * geo-
>>> all_luns;
>>>>>>> +	} else {
>>>>>>> +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo-
>>> all_luns;
>>>>>>> +		/*
>>>>>>> +		 * Some devices can expose mw_cunits equal to 0, so
>> let's
>>>>>> use
>>>>>>> +		 * here default safe value for MLC.
>>>>>>> +		 */
>>>>>>> +	}
>>>>>>> 
>>>>>>> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
>>>>>>> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git
>>>>>>> a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>>>>> index
>>>>>>> 41279da799ed..c747792da915 100644
>>>>>>> --- a/drivers/nvme/host/lightnvm.c
>>>>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>>>>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
>>>>>> nvme_nvm_id12
>>>>>>> *id,
>>>>>>> 
>>>>>>> 	geo->ws_min = sec_per_pg;
>>>>>>> 	geo->ws_opt = sec_per_pg;
>>>>>>> -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC
>> safe values
>>>>>> */
>>>>>>> /* Do not impose values for maximum number of open blocks as it is
>>>>>>> 	 * unspecified in 1.2. Users of 1.2 must be aware of this and
>>>>>>> eventually
>>>>>>> --
>>>>>>> 2.11.0
>>>>>> 
>>>>>> By doing this, 1.2 future users (beyond pblk), will fail to have a
>>>>>> valid mw_cunits value. It's ok to deal with the 0 case in pblk, but
>>>>>> I believe that we should have the default value for 1.2 either way.
>>>>> 
>>>>> I'm not sure. From my understanding, setting of default value was
>>>>> workaround for pblk case, am I right ?.
>>>> 
>>>> The default value covers the MLC case directly at the lightnvm layer,
>>>> as opposed to doing it directly in pblk. Since pblk is the only user
>>>> now, you can argue that all changes in the lightnvm layer are to
>>>> solve pblk issues, but the idea is that the geometry should be generic.
>>>> 
>>>>> In my opinion any user of 1.2
>>>>> spec should be aware that there is not mw_cunit value. From my point
>>>>> of view, leaving here 0 (and decision what do with it to lightnvm
>>>>> user) is more safer way, but maybe I'm wrong. I believe that it is
>>>>> topic to wider discussion with maintainers.
>>>> 
>>>> 1.2 and 2.0 have different geometries, but when we designed the
>>>> common nvm_geo structure, the idea was to abstract both specs and
>>>> allow the upper layers to use the geometry transparently.
>>>> 
>>>> Specifically in pblk, I would prefer to keep it in such a way that we
>>>> don't need to media specific policies (e.g., set default values for
>>>> MLC memories), as a general design principle. We already do some
>>>> geometry version checks to avoid dereferencing unnecessary pointers
>>>> on the fast path, which I would eventually like to remove.
>>> 
>>> Ok, now I understand your point of view and agree with that, I will
>>> prepare second version of this patch without this change.
>> 
>> Sounds good.
>> 
>>> Thanks for
>>> the clarification.
>> 
>> Sure :)
>> 
>>>>>> A more generic way of doing this would be to have a default value
>>>>>> for
>>>>>> 2.0 too, in case mw_cunits is reported as 0.
>>>>> 
>>>>> Since 0 is correct value and users can make different decisions
>>>>> based on it, I think we shouldn't overwrite it by default value. Is
>>>>> it make sense?
>>>> 
>>>> Here I meant at a pblk level - I should have specified it. At the
>>>> geometry level, we should not change it.
>>>> 
>>>> The case I am thinking is if mw_cuints repoints 0, but ws_min > 0. In
>>>> this case, we still need a host side buffer to serve < ws_min I/Os,
>>>> even though the device does not require the buffer to guarantee reads.
>>> 
>>> Oh, ok now we are on the same page. In this patch I was trying to
>>> address such case. Do you have other idea how to do it or here are you
>>> thinking only on value of default variable?
>> 
>> If doing this, I guess that something in the line of what you did with
>> increasing the size of the write buffer via a module parameter. For example,
>> checking if the size of the write buffer based on mw_cuints is enough to
>> cover ws_min, which normally would only be an issue when mw_cuints == 0
>> or when the number of PUs used for the pblk instance is very small and
>> mw_cuints < nr_luns * ws_min.
> 
> 
> I see here two cases:
> - when mw_cunits > 0 buffer size should have number of entries at
> least max(mw_cunits, ws_min) * nr_luns and here we are taking care of
> both cases mw_cunits > ws_min and mw_cunits < ws_min.
> - when mw_cunit == 0 buffer size should have number of entries at
> least ws_min * nr _luns and we can use the same puseudocode as above.
> 

Agree.

> Do you see any other case? Could you clarify second case mentioned by
> you or maybe did you mean opposite case? If yes, I believe that above
> pseudo code will handle such case too.
> 

Yes, it is the same case.

One thing to consider is whether the buffer should at least be ws_opt *
nr_luns for performance reasons. Since the write thread will always try
to send ws_opt, in the case that ws_opt > ws_min, then a buffer size of
ws_min * nr_luns will not make use of the whole parallelism exposed by
the device.

Therefore, I would probably go for ws_opt * nr_luns as the default value
when mw_cuints * nr_luns < ws_opt * nr_luns (which covers mw_cuints ==
0), and then keep ws_min * nr_luns as the minimum requirement when
setting the buffer size manually.

Does this cover your use case?

>>>>>> Javier
>>>>> 
>>>>> Thanks,
>>>>> Marcin
>>>> 
>>>> Javier
>>> Thanks,
>>> Marcin
> Thanks!,
> Marcin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-06-05  7:12               ` Javier Gonzalez
@ 2018-06-05  9:18                 ` Dziegielewski, Marcin
  0 siblings, 0 replies; 37+ messages in thread
From: Dziegielewski, Marcin @ 2018-06-05  9:18 UTC (permalink / raw)
  To: Javier Gonzalez
  Cc: Matias Bjorling, Jens Axboe, linux-block, linux-kernel, Konopko, Igor J

> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> Sent: Tuesday, June 5, 2018 9:12 AM
> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>; linux-
> block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko, Igor J
> <igor.j.konopko@intel.com>
> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
> equals to 0
> 
> > On 4 Jun 2018, at 19.17, Dziegielewski, Marcin
> <marcin.dziegielewski@intel.com> wrote:
> >
> >> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> >> Sent: Monday, June 4, 2018 1:16 PM
> >> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
> >> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>;
> >> linux- block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko,
> >> Igor J <igor.j.konopko@intel.com>
> >> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
> >> mw_cunits equals to 0
> >>
> >>
> >>> On 4 Jun 2018, at 13.11, Dziegielewski, Marcin
> >> <marcin.dziegielewski@intel.com> wrote:
> >>>> -----Original Message-----
> >>>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> >>>> Sent: Monday, June 4, 2018 12:22 PM
> >>>> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
> >>>> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>;
> >>>> linux- block@vger.kernel.org; linux-kernel@vger.kernel.org;
> >>>> Konopko, Igor J <igor.j.konopko@intel.com>
> >>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
> >>>> mw_cunits equals to 0
> >>>>
> >>>>> On 4 Jun 2018, at 12.09, Dziegielewski, Marcin
> >>>> <marcin.dziegielewski@intel.com> wrote:
> >>>>> Frist of all I want to say sorry for late response - I was on holiday.
> >>>>>
> >>>>>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> >>>>>> Sent: Monday, May 28, 2018 1:03 PM
> >>>>>> To: Matias Bjørling <mb@lightnvm.io>
> >>>>>> Cc: Jens Axboe <axboe@fb.com>; linux-block@vger.kernel.org;
> >>>>>> linux- kernel@vger.kernel.org; Dziegielewski, Marcin
> >>>>>> <marcin.dziegielewski@intel.com>; Konopko, Igor J
> >>>>>> <igor.j.konopko@intel.com>
> >>>>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
> >>>>>> mw_cunits equals to 0
> >>>>>>
> >>>>>>> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
> >>>>>>>
> >>>>>>> From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> >>>>>>>
> >>>>>>> Some devices can expose mw_cunits equal to 0, it can cause
> >>>>>>> creation of too small write buffer and cause performance to drop
> >>>>>>> on write workloads.
> >>>>>>>
> >>>>>>> To handle that, we use the default value for MLC and beacause it
> >>>>>>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits
> >>>>>>> in
> >>>>>>> nvme_nvm_setup_12 function isn't longer necessary.
> >>>>>>>
> >>>>>>> Signed-off-by: Marcin Dziegielewski
> >>>>>>> <marcin.dziegielewski@intel.com>
> >>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>>>>>> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
> >>>>>>> ---
> >>>>>>> drivers/lightnvm/pblk-init.c | 10 +++++++++-
> >>>>>>> drivers/nvme/host/lightnvm.c |  1 -
> >>>>>>> 2 files changed, 9 insertions(+), 2 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/lightnvm/pblk-init.c
> >>>>>>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b
> >>>>>>> 100644
> >>>>>>> --- a/drivers/lightnvm/pblk-init.c
> >>>>>>> +++ b/drivers/lightnvm/pblk-init.c
> >>>>>>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
> >>>>>>> 	atomic64_set(&pblk->nr_flush, 0);
> >>>>>>> 	pblk->nr_flush_rst = 0;
> >>>>>>>
> >>>>>>> -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> >>>>>>> +	if (geo->mw_cunits) {
> >>>>>>> +		pblk->pgs_in_buffer = geo->mw_cunits * geo-
> >>> all_luns;
> >>>>>>> +	} else {
> >>>>>>> +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo-
> >>> all_luns;
> >>>>>>> +		/*
> >>>>>>> +		 * Some devices can expose mw_cunits equal to 0, so
> >> let's
> >>>>>> use
> >>>>>>> +		 * here default safe value for MLC.
> >>>>>>> +		 */
> >>>>>>> +	}
> >>>>>>>
> >>>>>>> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs /
> PAGE_SIZE);
> >>>>>>> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff
> >>>>>>> --git a/drivers/nvme/host/lightnvm.c
> >>>>>>> b/drivers/nvme/host/lightnvm.c index
> >>>>>>> 41279da799ed..c747792da915 100644
> >>>>>>> --- a/drivers/nvme/host/lightnvm.c
> >>>>>>> +++ b/drivers/nvme/host/lightnvm.c
> >>>>>>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
> >>>>>> nvme_nvm_id12
> >>>>>>> *id,
> >>>>>>>
> >>>>>>> 	geo->ws_min = sec_per_pg;
> >>>>>>> 	geo->ws_opt = sec_per_pg;
> >>>>>>> -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC
> >> safe values
> >>>>>> */
> >>>>>>> /* Do not impose values for maximum number of open blocks as it
> is
> >>>>>>> 	 * unspecified in 1.2. Users of 1.2 must be aware of this and
> >>>>>>> eventually
> >>>>>>> --
> >>>>>>> 2.11.0
> >>>>>>
> >>>>>> By doing this, 1.2 future users (beyond pblk), will fail to have
> >>>>>> a valid mw_cunits value. It's ok to deal with the 0 case in pblk,
> >>>>>> but I believe that we should have the default value for 1.2 either
> way.
> >>>>>
> >>>>> I'm not sure. From my understanding, setting of default value was
> >>>>> workaround for pblk case, am I right ?.
> >>>>
> >>>> The default value covers the MLC case directly at the lightnvm
> >>>> layer, as opposed to doing it directly in pblk. Since pblk is the
> >>>> only user now, you can argue that all changes in the lightnvm layer
> >>>> are to solve pblk issues, but the idea is that the geometry should be
> generic.
> >>>>
> >>>>> In my opinion any user of 1.2
> >>>>> spec should be aware that there is not mw_cunit value. From my
> >>>>> point of view, leaving here 0 (and decision what do with it to
> >>>>> lightnvm
> >>>>> user) is more safer way, but maybe I'm wrong. I believe that it is
> >>>>> topic to wider discussion with maintainers.
> >>>>
> >>>> 1.2 and 2.0 have different geometries, but when we designed the
> >>>> common nvm_geo structure, the idea was to abstract both specs and
> >>>> allow the upper layers to use the geometry transparently.
> >>>>
> >>>> Specifically in pblk, I would prefer to keep it in such a way that
> >>>> we don't need to media specific policies (e.g., set default values
> >>>> for MLC memories), as a general design principle. We already do
> >>>> some geometry version checks to avoid dereferencing unnecessary
> >>>> pointers on the fast path, which I would eventually like to remove.
> >>>
> >>> Ok, now I understand your point of view and agree with that, I will
> >>> prepare second version of this patch without this change.
> >>
> >> Sounds good.
> >>
> >>> Thanks for
> >>> the clarification.
> >>
> >> Sure :)
> >>
> >>>>>> A more generic way of doing this would be to have a default value
> >>>>>> for
> >>>>>> 2.0 too, in case mw_cunits is reported as 0.
> >>>>>
> >>>>> Since 0 is correct value and users can make different decisions
> >>>>> based on it, I think we shouldn't overwrite it by default value.
> >>>>> Is it make sense?
> >>>>
> >>>> Here I meant at a pblk level - I should have specified it. At the
> >>>> geometry level, we should not change it.
> >>>>
> >>>> The case I am thinking is if mw_cuints repoints 0, but ws_min > 0.
> >>>> In this case, we still need a host side buffer to serve < ws_min
> >>>> I/Os, even though the device does not require the buffer to guarantee
> reads.
> >>>
> >>> Oh, ok now we are on the same page. In this patch I was trying to
> >>> address such case. Do you have other idea how to do it or here are
> >>> you thinking only on value of default variable?
> >>
> >> If doing this, I guess that something in the line of what you did
> >> with increasing the size of the write buffer via a module parameter.
> >> For example, checking if the size of the write buffer based on
> >> mw_cuints is enough to cover ws_min, which normally would only be an
> >> issue when mw_cuints == 0 or when the number of PUs used for the pblk
> >> instance is very small and mw_cuints < nr_luns * ws_min.
> >
> >
> > I see here two cases:
> > - when mw_cunits > 0 buffer size should have number of entries at
> > least max(mw_cunits, ws_min) * nr_luns and here we are taking care of
> > both cases mw_cunits > ws_min and mw_cunits < ws_min.
> > - when mw_cunit == 0 buffer size should have number of entries at
> > least ws_min * nr _luns and we can use the same puseudocode as above.
> >
> 
> Agree.
> 
> > Do you see any other case? Could you clarify second case mentioned by
> > you or maybe did you mean opposite case? If yes, I believe that above
> > pseudo code will handle such case too.
> >
> 
> Yes, it is the same case.
> 
> One thing to consider is whether the buffer should at least be ws_opt *
> nr_luns for performance reasons. Since the write thread will always try to
> send ws_opt, in the case that ws_opt > ws_min, then a buffer size of
> ws_min * nr_luns will not make use of the whole parallelism exposed by the
> device.
> 

Agree. After I had sent last email I also thought  that should be ws_opt instead of ws_min.

> Therefore, I would probably go for ws_opt * nr_luns as the default value
> when mw_cuints * nr_luns < ws_opt * nr_luns (which covers mw_cuints ==
> 0), and then keep ws_min * nr_luns as the minimum requirement when
> setting the buffer size manually.

Sounds good.
> 
> Does this cover your use case?
> 

Yes, I think that cover all cases related to this topic. I will prepare patch in the afternoon.

Many thanks for perfect cooperation!
Marcin
> >>>>>> Javier
> >>>>>
> >>>>> Thanks,
> >>>>> Marcin
> >>>>
> >>>> Javier
> >>> Thanks,
> >>> Marcin
> > Thanks!,
> > Marcin

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2018-06-05  9:18 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 01/20] lightnvm: pblk: fail gracefully on line alloc. failure Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 02/20] lightnvm: pblk: recheck for bad lines at runtime Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 03/20] lightnvm: pblk: check read lba on gc path Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 04/20] lightnvm: pblk: improve error msg on corrupted LBAs Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 05/20] lightnvm: pblk: warn in case of corrupted write buffer Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 06/20] lightnvm: pblk: return NVM_ error on failed submission Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 07/20] lightnvm: pblk: remove unnecessary indirection Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 08/20] lightnvm: pblk: remove unnecessary argument Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 09/20] lightnvm: pblk: check for chunk size before allocating it Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 10/20] lightnvm: pass flag on graceful teardown to targets Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 11/20] lightnvm: pblk: remove dead function Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 12/20] lightnvm: pblk: rework write error recovery path Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 13/20] lightnvm: pblk: garbage collect lines with failed writes Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 14/20] lightnvm: pblk: fix smeta write error path Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 15/20] lightnvm: proper error handling for pblk_bio_add_pages Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 16/20] lightnvm: error handling when whole line is bad Matias Bjørling
2018-05-28 10:59   ` Javier Gonzalez
2018-05-29 13:15     ` Konopko, Igor J
2018-05-29 18:29       ` Javier Gonzalez
2018-05-28  8:58 ` [GIT PULL 17/20] lightnvm: fix partial read error path Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0 Matias Bjørling
2018-05-28 11:02   ` Javier Gonzalez
2018-06-04 10:09     ` Dziegielewski, Marcin
2018-06-04 10:21       ` Javier Gonzalez
2018-06-04 11:11         ` Dziegielewski, Marcin
2018-06-04 11:15           ` Javier Gonzalez
2018-06-04 17:17             ` Dziegielewski, Marcin
2018-06-05  7:12               ` Javier Gonzalez
2018-06-05  9:18                 ` Dziegielewski, Marcin
2018-05-28  8:58 ` [GIT PULL 19/20] lightnvm: pblk: add possibility to set write buffer size manually Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC Matias Bjørling
2018-05-28 10:51   ` Javier Gonzalez
2018-05-29 13:07     ` Konopko, Igor J
2018-05-29 17:58       ` Javier Gonzalez
2018-06-01 10:45 ` [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
2018-06-01 12:36   ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).