LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 5.4 00/27] 5.4.141-rc1 review
@ 2021-08-13 15:06 Greg Kroah-Hartman
  2021-08-13 15:06 ` [PATCH 5.4 01/27] KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB Greg Kroah-Hartman
                   ` (31 more replies)
  0 siblings, 32 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuah, patches,
	lkft-triage, pavel, jonathanh, f.fainelli, stable

This is the start of the stable review cycle for the 5.4.141 release.
There are 27 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sun, 15 Aug 2021 15:05:12 +0000.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.141-rc1.gz
or in the git tree and branch at:
	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 5.4.141-rc1

Nikolay Borisov <nborisov@suse.com>
    btrfs: don't flush from btrfs_delayed_inode_reserve_metadata

Nikolay Borisov <nborisov@suse.com>
    btrfs: export and rename qgroup_reserve_meta

Qu Wenruo <wqu@suse.com>
    btrfs: qgroup: don't commit transaction when we already hold the handle

YueHaibing <yuehaibing@huawei.com>
    net: xilinx_emaclite: Do not print real IOMEM pointer

Filipe Manana <fdmanana@suse.com>
    btrfs: fix lockdep splat when enabling and disabling qgroups

Qu Wenruo <wqu@suse.com>
    btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve retry-after-EDQUOT

Qu Wenruo <wqu@suse.com>
    btrfs: transaction: Cleanup unused TRANS_STATE_BLOCKED

Qu Wenruo <wqu@suse.com>
    btrfs: qgroup: try to flush qgroup space when we get -EDQUOT

Qu Wenruo <wqu@suse.com>
    btrfs: qgroup: allow to unreserve range without releasing other ranges

Nikolay Borisov <nborisov@suse.com>
    btrfs: make btrfs_qgroup_reserve_data take btrfs_inode

Nikolay Borisov <nborisov@suse.com>
    btrfs: make qgroup_free_reserved_data take btrfs_inode

Miklos Szeredi <mszeredi@redhat.com>
    ovl: prevent private clone if bind mount is not allowed

Pali Rohár <pali@kernel.org>
    ppp: Fix generating ppp unit id when ifname is not specified

Luke D Jones <luke@ljones.dev>
    ALSA: hda: Add quirk for ASUS Flow x13

Longfang Liu <liulongfang@huawei.com>
    USB:ehci:fix Kunpeng920 ehci hardware problem

Lai Jiangshan <laijs@linux.alibaba.com>
    KVM: X86: MMU: Use the correct inherited permissions to get shadow page

Wesley Cheng <wcheng@codeaurora.org>
    usb: dwc3: gadget: Avoid runtime resume if disabling pullup

Wesley Cheng <wcheng@codeaurora.org>
    usb: dwc3: gadget: Disable gadget IRQ during pullup disable

Wesley Cheng <wcheng@codeaurora.org>
    usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable

Wesley Cheng <wcheng@codeaurora.org>
    usb: dwc3: gadget: Prevent EP queuing while stopping transfers

Wesley Cheng <wcheng@codeaurora.org>
    usb: dwc3: gadget: Restart DWC3 gadget when enabling pullup

Wesley Cheng <wcheng@codeaurora.org>
    usb: dwc3: gadget: Allow runtime suspend if UDC unbinded

Wesley Cheng <wcheng@codeaurora.org>
    usb: dwc3: Stop active transfers before halting the controller

Masami Hiramatsu <mhiramat@kernel.org>
    tracing: Reject string operand in the histogram expression

Alexandre Courbot <gnurou@gmail.com>
    media: v4l2-mem2mem: always consider OUTPUT queue during poll

Sumit Garg <sumit.garg@linaro.org>
    tee: Correct inappropriate usage of TEE_SHM_DMA_BUF flag

Sean Christopherson <seanjc@google.com>
    KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB


-------------

Diffstat:

 Documentation/virt/kvm/mmu.txt                |   4 +-
 Makefile                                      |   4 +-
 arch/x86/kvm/paging_tmpl.h                    |  14 +-
 arch/x86/kvm/svm.c                            |   2 +-
 drivers/media/v4l2-core/v4l2-mem2mem.c        |   6 +-
 drivers/net/ethernet/xilinx/xilinx_emaclite.c |   5 +-
 drivers/net/ppp/ppp_generic.c                 |  19 +-
 drivers/tee/optee/call.c                      |   2 +-
 drivers/tee/optee/core.c                      |   3 +-
 drivers/tee/optee/rpc.c                       |   5 +-
 drivers/tee/optee/shm_pool.c                  |   8 +-
 drivers/tee/tee_shm.c                         |   4 +-
 drivers/usb/dwc3/ep0.c                        |   2 +-
 drivers/usb/dwc3/gadget.c                     | 118 +++++++--
 drivers/usb/host/ehci-pci.c                   |   3 +
 fs/btrfs/ctree.h                              |  13 +-
 fs/btrfs/delalloc-space.c                     |   2 +-
 fs/btrfs/delayed-inode.c                      |   3 +-
 fs/btrfs/disk-io.c                            |   4 +-
 fs/btrfs/file.c                               |   7 +-
 fs/btrfs/inode.c                              |   2 +-
 fs/btrfs/qgroup.c                             | 328 +++++++++++++++++++-------
 fs/btrfs/qgroup.h                             |   5 +-
 fs/btrfs/transaction.c                        |  16 +-
 fs/btrfs/transaction.h                        |  15 --
 fs/namespace.c                                |  42 ++--
 include/linux/tee_drv.h                       |   1 +
 kernel/trace/trace_events_hist.c              |  20 +-
 sound/pci/hda/patch_realtek.c                 |   1 +
 29 files changed, 468 insertions(+), 190 deletions(-)



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 01/27] KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
@ 2021-08-13 15:06 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 02/27] tee: Correct inappropriate usage of TEE_SHM_DMA_BUF flag Greg Kroah-Hartman
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tom Lendacky, Brijesh Singh,
	Sean Christopherson, Paolo Bonzini, Sasha Levin

From: Sean Christopherson <seanjc@google.com>

[ Upstream commit 179c6c27bf487273652efc99acd3ba512a23c137 ]

Use the raw ASID, not ASID-1, when nullifying the last used VMCB when
freeing an SEV ASID.  The consumer, pre_sev_run(), indexes the array by
the raw ASID, thus KVM could get a false negative when checking for a
different VMCB if KVM manages to reallocate the same ASID+VMCB combo for
a new VM.

Note, this cannot cause a functional issue _in the current code_, as
pre_sev_run() also checks which pCPU last did VMRUN for the vCPU, and
last_vmentry_cpu is initialized to -1 during vCPU creation, i.e. is
guaranteed to mismatch on the first VMRUN.  However, prior to commit
8a14fe4f0c54 ("kvm: x86: Move last_cpu into kvm_vcpu_arch as
last_vmentry_cpu"), SVM tracked pCPU on its own and zero-initialized the
last_cpu variable.  Thus it's theoretically possible that older versions
of KVM could miss a TLB flush if the first VMRUN is on pCPU0 and the ASID
and VMCB exactly match those of a prior VM.

Fixes: 70cd94e60c73 ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/kvm/svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 7341d22ed04f..2a958dcc80f2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1783,7 +1783,7 @@ static void __sev_asid_free(int asid)
 
 	for_each_possible_cpu(cpu) {
 		sd = per_cpu(svm_data, cpu);
-		sd->sev_vmcbs[pos] = NULL;
+		sd->sev_vmcbs[asid] = NULL;
 	}
 }
 
-- 
2.30.2




^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 02/27] tee: Correct inappropriate usage of TEE_SHM_DMA_BUF flag
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
  2021-08-13 15:06 ` [PATCH 5.4 01/27] KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 03/27] media: v4l2-mem2mem: always consider OUTPUT queue during poll Greg Kroah-Hartman
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sumit Garg, Tyler Hicks,
	Jens Wiklander, Sasha Levin

From: Sumit Garg <sumit.garg@linaro.org>

[ Upstream commit 376e4199e327a5cf29b8ec8fb0f64f3d8b429819 ]

Currently TEE_SHM_DMA_BUF flag has been inappropriately used to not
register shared memory allocated for private usage by underlying TEE
driver: OP-TEE in this case. So rather add a new flag as TEE_SHM_PRIV
that can be utilized by underlying TEE drivers for private allocation
and usage of shared memory.

With this corrected, allow tee_shm_alloc_kernel_buf() to allocate a
shared memory region without the backing of dma-buf.

Cc: stable@vger.kernel.org
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Co-developed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>
Reviewed-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/tee/optee/call.c     | 2 +-
 drivers/tee/optee/core.c     | 3 ++-
 drivers/tee/optee/rpc.c      | 5 +++--
 drivers/tee/optee/shm_pool.c | 8 ++++++--
 drivers/tee/tee_shm.c        | 4 ++--
 include/linux/tee_drv.h      | 1 +
 6 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
index 4b5069f88d78..3a54455d9ddf 100644
--- a/drivers/tee/optee/call.c
+++ b/drivers/tee/optee/call.c
@@ -181,7 +181,7 @@ static struct tee_shm *get_msg_arg(struct tee_context *ctx, size_t num_params,
 	struct optee_msg_arg *ma;
 
 	shm = tee_shm_alloc(ctx, OPTEE_MSG_GET_ARG_SIZE(num_params),
-			    TEE_SHM_MAPPED);
+			    TEE_SHM_MAPPED | TEE_SHM_PRIV);
 	if (IS_ERR(shm))
 		return shm;
 
diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
index 432dd38921dd..4bb4c8f28cbd 100644
--- a/drivers/tee/optee/core.c
+++ b/drivers/tee/optee/core.c
@@ -254,7 +254,8 @@ static void optee_release(struct tee_context *ctx)
 	if (!ctxdata)
 		return;
 
-	shm = tee_shm_alloc(ctx, sizeof(struct optee_msg_arg), TEE_SHM_MAPPED);
+	shm = tee_shm_alloc(ctx, sizeof(struct optee_msg_arg),
+			    TEE_SHM_MAPPED | TEE_SHM_PRIV);
 	if (!IS_ERR(shm)) {
 		arg = tee_shm_get_va(shm, 0);
 		/*
diff --git a/drivers/tee/optee/rpc.c b/drivers/tee/optee/rpc.c
index b4ade54d1f28..aecf62016e7b 100644
--- a/drivers/tee/optee/rpc.c
+++ b/drivers/tee/optee/rpc.c
@@ -220,7 +220,7 @@ static void handle_rpc_func_cmd_shm_alloc(struct tee_context *ctx,
 		shm = cmd_alloc_suppl(ctx, sz);
 		break;
 	case OPTEE_MSG_RPC_SHM_TYPE_KERNEL:
-		shm = tee_shm_alloc(ctx, sz, TEE_SHM_MAPPED);
+		shm = tee_shm_alloc(ctx, sz, TEE_SHM_MAPPED | TEE_SHM_PRIV);
 		break;
 	default:
 		arg->ret = TEEC_ERROR_BAD_PARAMETERS;
@@ -405,7 +405,8 @@ void optee_handle_rpc(struct tee_context *ctx, struct optee_rpc_param *param,
 
 	switch (OPTEE_SMC_RETURN_GET_RPC_FUNC(param->a0)) {
 	case OPTEE_SMC_RPC_FUNC_ALLOC:
-		shm = tee_shm_alloc(ctx, param->a1, TEE_SHM_MAPPED);
+		shm = tee_shm_alloc(ctx, param->a1,
+				    TEE_SHM_MAPPED | TEE_SHM_PRIV);
 		if (!IS_ERR(shm) && !tee_shm_get_pa(shm, 0, &pa)) {
 			reg_pair_from_64(&param->a1, &param->a2, pa);
 			reg_pair_from_64(&param->a4, &param->a5,
diff --git a/drivers/tee/optee/shm_pool.c b/drivers/tee/optee/shm_pool.c
index da06ce9b9313..c41a9a501a6e 100644
--- a/drivers/tee/optee/shm_pool.c
+++ b/drivers/tee/optee/shm_pool.c
@@ -27,7 +27,11 @@ static int pool_op_alloc(struct tee_shm_pool_mgr *poolm,
 	shm->paddr = page_to_phys(page);
 	shm->size = PAGE_SIZE << order;
 
-	if (shm->flags & TEE_SHM_DMA_BUF) {
+	/*
+	 * Shared memory private to the OP-TEE driver doesn't need
+	 * to be registered with OP-TEE.
+	 */
+	if (!(shm->flags & TEE_SHM_PRIV)) {
 		unsigned int nr_pages = 1 << order, i;
 		struct page **pages;
 
@@ -60,7 +64,7 @@ err:
 static void pool_op_free(struct tee_shm_pool_mgr *poolm,
 			 struct tee_shm *shm)
 {
-	if (shm->flags & TEE_SHM_DMA_BUF)
+	if (!(shm->flags & TEE_SHM_PRIV))
 		optee_shm_unregister(shm->ctx, shm);
 
 	free_pages((unsigned long)shm->kaddr, get_order(shm->size));
diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
index 1b4b4a1ba91d..d6491e973fa4 100644
--- a/drivers/tee/tee_shm.c
+++ b/drivers/tee/tee_shm.c
@@ -117,7 +117,7 @@ static struct tee_shm *__tee_shm_alloc(struct tee_context *ctx,
 		return ERR_PTR(-EINVAL);
 	}
 
-	if ((flags & ~(TEE_SHM_MAPPED | TEE_SHM_DMA_BUF))) {
+	if ((flags & ~(TEE_SHM_MAPPED | TEE_SHM_DMA_BUF | TEE_SHM_PRIV))) {
 		dev_err(teedev->dev.parent, "invalid shm flags 0x%x", flags);
 		return ERR_PTR(-EINVAL);
 	}
@@ -233,7 +233,7 @@ EXPORT_SYMBOL_GPL(tee_shm_priv_alloc);
  */
 struct tee_shm *tee_shm_alloc_kernel_buf(struct tee_context *ctx, size_t size)
 {
-	return tee_shm_alloc(ctx, size, TEE_SHM_MAPPED | TEE_SHM_DMA_BUF);
+	return tee_shm_alloc(ctx, size, TEE_SHM_MAPPED);
 }
 EXPORT_SYMBOL_GPL(tee_shm_alloc_kernel_buf);
 
diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h
index 91677f2fa2e8..cd15c1b7fae0 100644
--- a/include/linux/tee_drv.h
+++ b/include/linux/tee_drv.h
@@ -26,6 +26,7 @@
 #define TEE_SHM_REGISTER	BIT(3)  /* Memory registered in secure world */
 #define TEE_SHM_USER_MAPPED	BIT(4)  /* Memory mapped in user space */
 #define TEE_SHM_POOL		BIT(5)  /* Memory allocated from pool */
+#define TEE_SHM_PRIV		BIT(7)  /* Memory private to TEE driver */
 
 struct device;
 struct tee_device;
-- 
2.30.2




^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 03/27] media: v4l2-mem2mem: always consider OUTPUT queue during poll
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
  2021-08-13 15:06 ` [PATCH 5.4 01/27] KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 02/27] tee: Correct inappropriate usage of TEE_SHM_DMA_BUF flag Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 04/27] tracing: Reject string operand in the histogram expression Greg Kroah-Hartman
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Alexandre Courbot, Ezequiel Garcia,
	Hans Verkuil, Mauro Carvalho Chehab, Lecopzer Chen

From: Alexandre Courbot <gnurou@gmail.com>

commit 566463afdbc43c7744c5a1b89250fc808df03833 upstream.

If poll() is called on a m2m device with the EPOLLOUT event after the
last buffer of the CAPTURE queue is dequeued, any buffer available on
OUTPUT queue will never be signaled because v4l2_m2m_poll_for_data()
starts by checking whether dst_q->last_buffer_dequeued is set and
returns EPOLLIN in this case, without looking at the state of the OUTPUT
queue.

Fix this by not early returning so we keep checking the state of the
OUTPUT queue afterwards.

Signed-off-by: Alexandre Courbot <gnurou@gmail.com>
Reviewed-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/media/v4l2-core/v4l2-mem2mem.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

--- a/drivers/media/v4l2-core/v4l2-mem2mem.c
+++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
@@ -635,10 +635,8 @@ static __poll_t v4l2_m2m_poll_for_data(s
 		 * If the last buffer was dequeued from the capture queue,
 		 * return immediately. DQBUF will return -EPIPE.
 		 */
-		if (dst_q->last_buffer_dequeued) {
-			spin_unlock_irqrestore(&dst_q->done_lock, flags);
-			return EPOLLIN | EPOLLRDNORM;
-		}
+		if (dst_q->last_buffer_dequeued)
+			rc |= EPOLLIN | EPOLLRDNORM;
 	}
 	spin_unlock_irqrestore(&dst_q->done_lock, flags);
 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 04/27] tracing: Reject string operand in the histogram expression
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 03/27] media: v4l2-mem2mem: always consider OUTPUT queue during poll Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 05/27] usb: dwc3: Stop active transfers before halting the controller Greg Kroah-Hartman
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Masami Hiramatsu, Steven Rostedt (VMware)

From: Masami Hiramatsu <mhiramat@kernel.org>

commit a9d10ca4986571bffc19778742d508cc8dd13e02 upstream.

Since the string type can not be the target of the addition / subtraction
operation, it must be rejected. Without this fix, the string type silently
converted to digits.

Link: https://lkml.kernel.org/r/162742654278.290973.1523000673366456634.stgit@devnote2

Cc: stable@vger.kernel.org
Fixes: 100719dcef447 ("tracing: Add simple expression support to hist triggers")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/trace/trace_events_hist.c |   20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -66,7 +66,8 @@
 	C(INVALID_SUBSYS_EVENT,	"Invalid subsystem or event name"),	\
 	C(INVALID_REF_KEY,	"Using variable references in keys not supported"), \
 	C(VAR_NOT_FOUND,	"Couldn't find variable"),		\
-	C(FIELD_NOT_FOUND,	"Couldn't find field"),
+	C(FIELD_NOT_FOUND,	"Couldn't find field"),			\
+	C(INVALID_STR_OPERAND,	"String type can not be an operand in expression"),
 
 #undef C
 #define C(a, b)		HIST_ERR_##a
@@ -3038,6 +3039,13 @@ static struct hist_field *parse_unary(st
 		ret = PTR_ERR(operand1);
 		goto free;
 	}
+	if (operand1->flags & HIST_FIELD_FL_STRING) {
+		/* String type can not be the operand of unary operator. */
+		hist_err(file->tr, HIST_ERR_INVALID_STR_OPERAND, errpos(str));
+		destroy_hist_field(operand1, 0);
+		ret = -EINVAL;
+		goto free;
+	}
 
 	expr->flags |= operand1->flags &
 		(HIST_FIELD_FL_TIMESTAMP | HIST_FIELD_FL_TIMESTAMP_USECS);
@@ -3139,6 +3147,11 @@ static struct hist_field *parse_expr(str
 		operand1 = NULL;
 		goto free;
 	}
+	if (operand1->flags & HIST_FIELD_FL_STRING) {
+		hist_err(file->tr, HIST_ERR_INVALID_STR_OPERAND, errpos(operand1_str));
+		ret = -EINVAL;
+		goto free;
+	}
 
 	/* rest of string could be another expression e.g. b+c in a+b+c */
 	operand_flags = 0;
@@ -3148,6 +3161,11 @@ static struct hist_field *parse_expr(str
 		operand2 = NULL;
 		goto free;
 	}
+	if (operand2->flags & HIST_FIELD_FL_STRING) {
+		hist_err(file->tr, HIST_ERR_INVALID_STR_OPERAND, errpos(str));
+		ret = -EINVAL;
+		goto free;
+	}
 
 	ret = check_expr_operands(file->tr, operand1, operand2);
 	if (ret)



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 05/27] usb: dwc3: Stop active transfers before halting the controller
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 04/27] tracing: Reject string operand in the histogram expression Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 06/27] usb: dwc3: gadget: Allow runtime suspend if UDC unbinded Greg Kroah-Hartman
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, stable@vger.kernel.org, Wesley Cheng,
	Thinh Nguyen, Felipe Balbi, Wesley Cheng

From: Wesley Cheng <wcheng@codeaurora.org>

[ Upstream commit ae7e86108b12351028fa7e8796a59f9b2d9e1774 ]

In the DWC3 databook, for a device initiated disconnect or bus reset, the
driver is required to send dependxfer commands for any pending transfers.
In addition, before the controller can move to the halted state, the SW
needs to acknowledge any pending events.  If the controller is not halted
properly, there is a chance the controller will continue accessing stale or
freed TRBs and buffers.

Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
Reviewed-by: Thinh Nguyen <thinhn@synopsys.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/usb/dwc3/ep0.c    |    2 -
 drivers/usb/dwc3/gadget.c |   66 +++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 66 insertions(+), 2 deletions(-)

--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -197,7 +197,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep
 	int				ret;
 
 	spin_lock_irqsave(&dwc->lock, flags);
-	if (!dep->endpoint.desc) {
+	if (!dep->endpoint.desc || !dwc->pullups_connected) {
 		dev_err(dwc->dev, "%s: can't queue to disabled endpoint\n",
 				dep->name);
 		ret = -ESHUTDOWN;
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1511,7 +1511,7 @@ static int __dwc3_gadget_ep_queue(struct
 {
 	struct dwc3		*dwc = dep->dwc;
 
-	if (!dep->endpoint.desc) {
+	if (!dep->endpoint.desc || !dwc->pullups_connected) {
 		dev_err(dwc->dev, "%s: can't queue to disabled endpoint\n",
 				dep->name);
 		return -ESHUTDOWN;
@@ -1931,6 +1931,21 @@ static int dwc3_gadget_set_selfpowered(s
 	return 0;
 }
 
+static void dwc3_stop_active_transfers(struct dwc3 *dwc)
+{
+	u32 epnum;
+
+	for (epnum = 2; epnum < dwc->num_eps; epnum++) {
+		struct dwc3_ep *dep;
+
+		dep = dwc->eps[epnum];
+		if (!dep)
+			continue;
+
+		dwc3_remove_requests(dwc, dep);
+	}
+}
+
 static int dwc3_gadget_run_stop(struct dwc3 *dwc, int is_on, int suspend)
 {
 	u32			reg;
@@ -1976,6 +1991,9 @@ static int dwc3_gadget_run_stop(struct d
 	return 0;
 }
 
+static void dwc3_gadget_disable_irq(struct dwc3 *dwc);
+static void __dwc3_gadget_stop(struct dwc3 *dwc);
+
 static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
 {
 	struct dwc3		*dwc = gadget_to_dwc(g);
@@ -1999,7 +2017,46 @@ static int dwc3_gadget_pullup(struct usb
 		}
 	}
 
+	/*
+	 * Synchronize any pending event handling before executing the controller
+	 * halt routine.
+	 */
+	if (!is_on) {
+		dwc3_gadget_disable_irq(dwc);
+		synchronize_irq(dwc->irq_gadget);
+	}
+
 	spin_lock_irqsave(&dwc->lock, flags);
+
+	if (!is_on) {
+		u32 count;
+
+		/*
+		 * In the Synopsis DesignWare Cores USB3 Databook Rev. 3.30a
+		 * Section 4.1.8 Table 4-7, it states that for a device-initiated
+		 * disconnect, the SW needs to ensure that it sends "a DEPENDXFER
+		 * command for any active transfers" before clearing the RunStop
+		 * bit.
+		 */
+		dwc3_stop_active_transfers(dwc);
+		__dwc3_gadget_stop(dwc);
+
+		/*
+		 * In the Synopsis DesignWare Cores USB3 Databook Rev. 3.30a
+		 * Section 1.3.4, it mentions that for the DEVCTRLHLT bit, the
+		 * "software needs to acknowledge the events that are generated
+		 * (by writing to GEVNTCOUNTn) while it is waiting for this bit
+		 * to be set to '1'."
+		 */
+		count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
+		count &= DWC3_GEVNTCOUNT_MASK;
+		if (count > 0) {
+			dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count);
+			dwc->ev_buf->lpos = (dwc->ev_buf->lpos + count) %
+						dwc->ev_buf->length;
+		}
+	}
+
 	ret = dwc3_gadget_run_stop(dwc, is_on, false);
 	spin_unlock_irqrestore(&dwc->lock, flags);
 
@@ -3038,6 +3095,13 @@ static void dwc3_gadget_reset_interrupt(
 	}
 
 	dwc3_reset_gadget(dwc);
+	/*
+	 * In the Synopsis DesignWare Cores USB3 Databook Rev. 3.30a
+	 * Section 4.1.2 Table 4-2, it states that during a USB reset, the SW
+	 * needs to ensure that it sends "a DEPENDXFER command for any active
+	 * transfers."
+	 */
+	dwc3_stop_active_transfers(dwc);
 
 	reg = dwc3_readl(dwc->regs, DWC3_DCTL);
 	reg &= ~DWC3_DCTL_TSTCTRL_MASK;



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 06/27] usb: dwc3: gadget: Allow runtime suspend if UDC unbinded
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 05/27] usb: dwc3: Stop active transfers before halting the controller Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 07/27] usb: dwc3: gadget: Restart DWC3 gadget when enabling pullup Greg Kroah-Hartman
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, stable@vger.kernel.org, Wesley Cheng,
	Wesley Cheng

From: Wesley Cheng <wcheng@codeaurora.org>

[ Upstream commit 77adb8bdf4227257e26b7ff67272678e66a0b250 ]

The DWC3 runtime suspend routine checks for the USB connected parameter to
determine if the controller can enter into a low power state.  The
connected state is only set to false after receiving a disconnect event.
However, in the case of a device initiated disconnect (i.e. UDC unbind),
the controller is halted and a disconnect event is never generated.  Set
the connected flag to false if issuing a device initiated disconnect to
allow the controller to be suspended.

Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
Link: https://lore.kernel.org/r/1609283136-22140-2-git-send-email-wcheng@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/usb/dwc3/gadget.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2018,6 +2018,17 @@ static int dwc3_gadget_pullup(struct usb
 	}
 
 	/*
+	 * Check the return value for successful resume, or error.  For a
+	 * successful resume, the DWC3 runtime PM resume routine will handle
+	 * the run stop sequence, so avoid duplicate operations here.
+	 */
+	ret = pm_runtime_get_sync(dwc->dev);
+	if (!ret || ret < 0) {
+		pm_runtime_put(dwc->dev);
+		return 0;
+	}
+
+	/*
 	 * Synchronize any pending event handling before executing the controller
 	 * halt routine.
 	 */
@@ -2055,10 +2066,12 @@ static int dwc3_gadget_pullup(struct usb
 			dwc->ev_buf->lpos = (dwc->ev_buf->lpos + count) %
 						dwc->ev_buf->length;
 		}
+		dwc->connected = false;
 	}
 
 	ret = dwc3_gadget_run_stop(dwc, is_on, false);
 	spin_unlock_irqrestore(&dwc->lock, flags);
+	pm_runtime_put(dwc->dev);
 
 	return ret;
 }



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 07/27] usb: dwc3: gadget: Restart DWC3 gadget when enabling pullup
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 06/27] usb: dwc3: gadget: Allow runtime suspend if UDC unbinded Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 08/27] usb: dwc3: gadget: Prevent EP queuing while stopping transfers Greg Kroah-Hartman
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, stable@vger.kernel.org, Wesley Cheng,
	Michael Tretter, Wesley Cheng

From: Wesley Cheng <wcheng@codeaurora.org>

[ Upstream commit a1383b3537a7bea1c213baa7878ccc4ecf4413b5 ]

usb_gadget_deactivate/usb_gadget_activate does not execute the UDC start
operation, which may leave EP0 disabled and event IRQs disabled when
re-activating the function. Move the enabling/disabling of USB EP0 and
device event IRQs to be performed in the pullup routine.

Fixes: ae7e86108b12 ("usb: dwc3: Stop active transfers before halting the controller")
Tested-by: Michael Tretter <m.tretter@pengutronix.de>
Cc: stable <stable@vger.kernel.org>
Reported-by: Michael Tretter <m.tretter@pengutronix.de>
Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
Link: https://lore.kernel.org/r/1609282837-21666-1-git-send-email-wcheng@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/usb/dwc3/gadget.c |   14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1993,6 +1993,7 @@ static int dwc3_gadget_run_stop(struct d
 
 static void dwc3_gadget_disable_irq(struct dwc3 *dwc);
 static void __dwc3_gadget_stop(struct dwc3 *dwc);
+static int __dwc3_gadget_start(struct dwc3 *dwc);
 
 static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
 {
@@ -2067,6 +2068,8 @@ static int dwc3_gadget_pullup(struct usb
 						dwc->ev_buf->length;
 		}
 		dwc->connected = false;
+	} else {
+		__dwc3_gadget_start(dwc);
 	}
 
 	ret = dwc3_gadget_run_stop(dwc, is_on, false);
@@ -2244,10 +2247,6 @@ static int dwc3_gadget_start(struct usb_
 	}
 
 	dwc->gadget_driver	= driver;
-
-	if (pm_runtime_active(dwc->dev))
-		__dwc3_gadget_start(dwc);
-
 	spin_unlock_irqrestore(&dwc->lock, flags);
 
 	return 0;
@@ -2273,13 +2272,6 @@ static int dwc3_gadget_stop(struct usb_g
 	unsigned long		flags;
 
 	spin_lock_irqsave(&dwc->lock, flags);
-
-	if (pm_runtime_suspended(dwc->dev))
-		goto out;
-
-	__dwc3_gadget_stop(dwc);
-
-out:
 	dwc->gadget_driver	= NULL;
 	spin_unlock_irqrestore(&dwc->lock, flags);
 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 08/27] usb: dwc3: gadget: Prevent EP queuing while stopping transfers
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 07/27] usb: dwc3: gadget: Restart DWC3 gadget when enabling pullup Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 09/27] usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, stable@vger.kernel.org, Wesley Cheng,
	Wesley Cheng

From: Wesley Cheng <wcheng@codeaurora.org>

[ Upstream commit f09ddcfcb8c569675066337adac2ac205113471f ]

In the situations where the DWC3 gadget stops active transfers, once
calling the dwc3_gadget_giveback(), there is a chance where a function
driver can queue a new USB request in between the time where the dwc3
lock has been released and re-aquired.  This occurs after we've already
issued an ENDXFER command.  When the stop active transfers continues
to remove USB requests from all dep lists, the newly added request will
also be removed, while controller still has an active TRB for it.
This can lead to the controller accessing an unmapped memory address.

Fix this by ensuring parameters to prevent EP queuing are set before
calling the stop active transfers API.

Fixes: ae7e86108b12 ("usb: dwc3: Stop active transfers before halting the controller")
Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
Link: https://lore.kernel.org/r/1615507142-23097-1-git-send-email-wcheng@codeaurora.org
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/usb/dwc3/gadget.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -746,8 +746,6 @@ static int __dwc3_gadget_ep_disable(stru
 
 	trace_dwc3_gadget_ep_disable(dep);
 
-	dwc3_remove_requests(dwc, dep);
-
 	/* make sure HW endpoint isn't stalled */
 	if (dep->flags & DWC3_EP_STALL)
 		__dwc3_gadget_ep_set_halt(dep, 0, false);
@@ -766,6 +764,8 @@ static int __dwc3_gadget_ep_disable(stru
 		dep->endpoint.desc = NULL;
 	}
 
+	dwc3_remove_requests(dwc, dep);
+
 	return 0;
 }
 
@@ -1511,7 +1511,7 @@ static int __dwc3_gadget_ep_queue(struct
 {
 	struct dwc3		*dwc = dep->dwc;
 
-	if (!dep->endpoint.desc || !dwc->pullups_connected) {
+	if (!dep->endpoint.desc || !dwc->pullups_connected || !dwc->connected) {
 		dev_err(dwc->dev, "%s: can't queue to disabled endpoint\n",
 				dep->name);
 		return -ESHUTDOWN;
@@ -2043,6 +2043,7 @@ static int dwc3_gadget_pullup(struct usb
 	if (!is_on) {
 		u32 count;
 
+		dwc->connected = false;
 		/*
 		 * In the Synopsis DesignWare Cores USB3 Databook Rev. 3.30a
 		 * Section 4.1.8 Table 4-7, it states that for a device-initiated
@@ -2067,7 +2068,6 @@ static int dwc3_gadget_pullup(struct usb
 			dwc->ev_buf->lpos = (dwc->ev_buf->lpos + count) %
 						dwc->ev_buf->length;
 		}
-		dwc->connected = false;
 	} else {
 		__dwc3_gadget_start(dwc);
 	}
@@ -3057,8 +3057,6 @@ static void dwc3_gadget_reset_interrupt(
 {
 	u32			reg;
 
-	dwc->connected = true;
-
 	/*
 	 * Ideally, dwc3_reset_gadget() would trigger the function
 	 * drivers to stop any active transfers through ep disable.
@@ -3107,6 +3105,7 @@ static void dwc3_gadget_reset_interrupt(
 	 * transfers."
 	 */
 	dwc3_stop_active_transfers(dwc);
+	dwc->connected = true;
 
 	reg = dwc3_readl(dwc->regs, DWC3_DCTL);
 	reg &= ~DWC3_DCTL_TSTCTRL_MASK;



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 09/27] usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 08/27] usb: dwc3: gadget: Prevent EP queuing while stopping transfers Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 10/27] usb: dwc3: gadget: Disable gadget IRQ during pullup disable Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, stable@vger.kernel.org, Wesley Cheng,
	Marek Szyprowski, Andy Shevchenko, Wesley Cheng

From: Wesley Cheng <wcheng@codeaurora.org>

[ Upstream commit 5aef629704ad4d983ecf5c8a25840f16e45b6d59 ]

Ensure that dep->flags are cleared until after stop active transfers
is completed.  Otherwise, the ENDXFER command will not be executed
during ep disable.

Fixes: f09ddcfcb8c5 ("usb: dwc3: gadget: Prevent EP queuing while stopping transfers")
Cc: stable <stable@vger.kernel.org>
Reported-and-tested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
Link: https://lore.kernel.org/r/1616610664-16495-1-git-send-email-wcheng@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/usb/dwc3/gadget.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -754,10 +754,6 @@ static int __dwc3_gadget_ep_disable(stru
 	reg &= ~DWC3_DALEPENA_EP(dep->number);
 	dwc3_writel(dwc->regs, DWC3_DALEPENA, reg);
 
-	dep->stream_capable = false;
-	dep->type = 0;
-	dep->flags = 0;
-
 	/* Clear out the ep descriptors for non-ep0 */
 	if (dep->number > 1) {
 		dep->endpoint.comp_desc = NULL;
@@ -766,6 +762,10 @@ static int __dwc3_gadget_ep_disable(stru
 
 	dwc3_remove_requests(dwc, dep);
 
+	dep->stream_capable = false;
+	dep->type = 0;
+	dep->flags = 0;
+
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 10/27] usb: dwc3: gadget: Disable gadget IRQ during pullup disable
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 09/27] usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 11/27] usb: dwc3: gadget: Avoid runtime resume if disabling pullup Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, stable@vger.kernel.org, Wesley Cheng,
	Wesley Cheng

From: Wesley Cheng <wcheng@codeaurora.org>

[ Upstream commit 8212937305f84ef73ea81036dafb80c557583d4b ]

Current sequence utilizes dwc3_gadget_disable_irq() alongside
synchronize_irq() to ensure that no further DWC3 events are generated.
However, the dwc3_gadget_disable_irq() API only disables device
specific events.  Endpoint events can still be generated.  Briefly
disable the interrupt line, so that the cleanup code can run to
prevent device and endpoint events. (i.e. __dwc3_gadget_stop() and
dwc3_stop_active_transfers() respectively)

Without doing so, it can lead to both the interrupt handler and the
pullup disable routine both writing to the GEVNTCOUNT register, which
will cause an incorrect count being read from future interrupts.

Fixes: ae7e86108b12 ("usb: dwc3: Stop active transfers before halting the controller")
Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
Link: https://lore.kernel.org/r/1621571037-1424-1-git-send-email-wcheng@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/usb/dwc3/gadget.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2030,13 +2030,10 @@ static int dwc3_gadget_pullup(struct usb
 	}
 
 	/*
-	 * Synchronize any pending event handling before executing the controller
-	 * halt routine.
+	 * Synchronize and disable any further event handling while controller
+	 * is being enabled/disabled.
 	 */
-	if (!is_on) {
-		dwc3_gadget_disable_irq(dwc);
-		synchronize_irq(dwc->irq_gadget);
-	}
+	disable_irq(dwc->irq_gadget);
 
 	spin_lock_irqsave(&dwc->lock, flags);
 
@@ -2074,6 +2071,8 @@ static int dwc3_gadget_pullup(struct usb
 
 	ret = dwc3_gadget_run_stop(dwc, is_on, false);
 	spin_unlock_irqrestore(&dwc->lock, flags);
+	enable_irq(dwc->irq_gadget);
+
 	pm_runtime_put(dwc->dev);
 
 	return ret;



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 11/27] usb: dwc3: gadget: Avoid runtime resume if disabling pullup
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 10/27] usb: dwc3: gadget: Disable gadget IRQ during pullup disable Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 12/27] KVM: X86: MMU: Use the correct inherited permissions to get shadow page Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, stable@vger.kernel.org, Wesley Cheng,
	Felipe Balbi, Wesley Cheng

From: Wesley Cheng <wcheng@codeaurora.org>

[ Upstream commit cb10f68ad8150f243964b19391711aaac5e8ff42 ]

If the device is already in the runtime suspended state, any call to
the pullup routine will issue a runtime resume on the DWC3 core
device.  If the USB gadget is disabling the pullup, then avoid having
to issue a runtime resume, as DWC3 gadget has already been
halted/stopped.

This fixes an issue where the following condition occurs:

usb_gadget_remove_driver()
-->usb_gadget_disconnect()
 -->dwc3_gadget_pullup(0)
  -->pm_runtime_get_sync() -> ret = 0
  -->pm_runtime_put() [async]
-->usb_gadget_udc_stop()
 -->dwc3_gadget_stop()
  -->dwc->gadget_driver = NULL
...

dwc3_suspend_common()
-->dwc3_gadget_suspend()
 -->DWC3 halt/stop routine skipped, driver_data == NULL

This leads to a situation where the DWC3 gadget is not properly
stopped, as the runtime resume would have re-enabled EP0 and event
interrupts, and since we avoided the DWC3 gadget suspend, these
resources were never disabled.

Fixes: 77adb8bdf422 ("usb: dwc3: gadget: Allow runtime suspend if UDC unbinded")
Cc: stable <stable@vger.kernel.org>
Acked-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
Link: https://lore.kernel.org/r/1628058245-30692-1-git-send-email-wcheng@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/usb/dwc3/gadget.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2019,6 +2019,17 @@ static int dwc3_gadget_pullup(struct usb
 	}
 
 	/*
+	 * Avoid issuing a runtime resume if the device is already in the
+	 * suspended state during gadget disconnect.  DWC3 gadget was already
+	 * halted/stopped during runtime suspend.
+	 */
+	if (!is_on) {
+		pm_runtime_barrier(dwc->dev);
+		if (pm_runtime_suspended(dwc->dev))
+			return 0;
+	}
+
+	/*
 	 * Check the return value for successful resume, or error.  For a
 	 * successful resume, the DWC3 runtime PM resume routine will handle
 	 * the run stop sequence, so avoid duplicate operations here.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 12/27] KVM: X86: MMU: Use the correct inherited permissions to get shadow page
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 11/27] usb: dwc3: gadget: Avoid runtime resume if disabling pullup Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 13/27] USB:ehci:fix Kunpeng920 ehci hardware problem Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Lai Jiangshan, Paolo Bonzini, Ovidiu Panait

From: Lai Jiangshan <laijs@linux.alibaba.com>

commit b1bd5cba3306691c771d558e94baa73e8b0b96b7 upstream.

When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions.  Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different.  KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries.  Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.

For example, here is a shared pagetable:

   pgd[]   pud[]        pmd[]            virtual address pointers
                     /->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
        /->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
   pgd-|           (shared pmd[] as above)
        \->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
                     \->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)

  pud1 and pud2 point to the same pmd table, so:
  - ptr1 and ptr3 points to the same page.
  - ptr2 and ptr4 points to the same page.

(pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)

- First, the guest reads from ptr1 first and KVM prepares a shadow
  page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
  "u--" comes from the effective permissions of pgd, pud1 and
  pmd1, which are stored in pt->access.  "u--" is used also to get
  the pagetable for pud1, instead of "uw-".

- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
  The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
  even though the pud1 pmd (because of the incorrect argument to
  kvm_mmu_get_page in the previous step) has role.access="u--".

- Then the guest reads from ptr3.  The hypervisor reuses pud1's
  shadow pmd for pud2, because both use "u--" for their permissions.
  Thus, the shadow pmd already includes entries for both pmd1 and pmd2.

- At last, the guest writes to ptr4.  This causes no vmexit or pagefault,
  because pud1's shadow page structures included an "uw-" page even though
  its role.access was "u--".

Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.

In order to fix the problem, we change pt->access to be an array, and
any access in it will not include permissions ANDed from child ptes.

The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.

The problem had existed long before the commit 41074d07c78b ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit.  So there is no fixes tag here.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com>
Cc: stable@vger.kernel.org
Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[OP: - apply arch/x86/kvm/mmu/* changes to arch/x86/kvm
     - apply documentation changes to Documentation/virt/kvm/mmu.txt
     - adjusted context in arch/x86/kvm/paging_tmpl.h]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/virt/kvm/mmu.txt |    4 ++--
 arch/x86/kvm/paging_tmpl.h     |   14 +++++++++-----
 2 files changed, 11 insertions(+), 7 deletions(-)

--- a/Documentation/virt/kvm/mmu.txt
+++ b/Documentation/virt/kvm/mmu.txt
@@ -152,8 +152,8 @@ Shadow pages contain the following infor
     shadow pages) so role.quadrant takes values in the range 0..3.  Each
     quadrant maps 1GB virtual address space.
   role.access:
-    Inherited guest access permissions in the form uwx.  Note execute
-    permission is positive, not negative.
+    Inherited guest access permissions from the parent ptes in the form uwx.
+    Note execute permission is positive, not negative.
   role.invalid:
     The page is invalid and should not be used.  It is a root page that is
     currently pinned (by a cpu hardware register pointing to it); once it is
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -90,8 +90,8 @@ struct guest_walker {
 	gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
 	pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
 	bool pte_writable[PT_MAX_FULL_LEVELS];
-	unsigned pt_access;
-	unsigned pte_access;
+	unsigned int pt_access[PT_MAX_FULL_LEVELS];
+	unsigned int pte_access;
 	gfn_t gfn;
 	struct x86_exception fault;
 };
@@ -406,13 +406,15 @@ retry_walk:
 		}
 
 		walker->ptes[walker->level - 1] = pte;
+
+		/* Convert to ACC_*_MASK flags for struct guest_walker.  */
+		walker->pt_access[walker->level - 1] = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
 	} while (!is_last_gpte(mmu, walker->level, pte));
 
 	pte_pkey = FNAME(gpte_pkeys)(vcpu, pte);
 	accessed_dirty = have_ad ? pte_access & PT_GUEST_ACCESSED_MASK : 0;
 
 	/* Convert to ACC_*_MASK flags for struct guest_walker.  */
-	walker->pt_access = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
 	walker->pte_access = FNAME(gpte_access)(pte_access ^ walk_nx_mask);
 	errcode = permission_fault(vcpu, mmu, walker->pte_access, pte_pkey, access);
 	if (unlikely(errcode))
@@ -451,7 +453,8 @@ retry_walk:
 	}
 
 	pgprintk("%s: pte %llx pte_access %x pt_access %x\n",
-		 __func__, (u64)pte, walker->pte_access, walker->pt_access);
+		 __func__, (u64)pte, walker->pte_access,
+		 walker->pt_access[walker->level - 1]);
 	return 1;
 
 error:
@@ -620,7 +623,7 @@ static int FNAME(fetch)(struct kvm_vcpu
 {
 	struct kvm_mmu_page *sp = NULL;
 	struct kvm_shadow_walk_iterator it;
-	unsigned direct_access, access = gw->pt_access;
+	unsigned int direct_access, access;
 	int top_level, ret;
 	gfn_t gfn, base_gfn;
 
@@ -652,6 +655,7 @@ static int FNAME(fetch)(struct kvm_vcpu
 		sp = NULL;
 		if (!is_shadow_present_pte(*it.sptep)) {
 			table_gfn = gw->table_gfn[it.level - 2];
+			access = gw->pt_access[it.level - 2];
 			sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
 					      false, access);
 		}



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 13/27] USB:ehci:fix Kunpeng920 ehci hardware problem
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 12/27] KVM: X86: MMU: Use the correct inherited permissions to get shadow page Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 14/27] ALSA: hda: Add quirk for ASUS Flow x13 Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Alan Stern, Longfang Liu

From: Longfang Liu <liulongfang@huawei.com>

commit 26b75952ca0b8b4b3050adb9582c8e2f44d49687 upstream.

Kunpeng920's EHCI controller does not have SBRN register.
Reading the SBRN register when the controller driver is
initialized will get 0.

When rebooting the EHCI driver, ehci_shutdown() will be called.
if the sbrn flag is 0, ehci_shutdown() will return directly.
The sbrn flag being 0 will cause the EHCI interrupt signal to
not be turned off after reboot. this interrupt that is not closed
will cause an exception to the device sharing the interrupt.

Therefore, the EHCI controller of Kunpeng920 needs to skip
the read operation of the SBRN register.

Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Link: https://lore.kernel.org/r/1617958081-17999-1-git-send-email-liulongfang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/usb/host/ehci-pci.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/drivers/usb/host/ehci-pci.c
+++ b/drivers/usb/host/ehci-pci.c
@@ -298,6 +298,9 @@ static int ehci_pci_setup(struct usb_hcd
 	if (pdev->vendor == PCI_VENDOR_ID_STMICRO
 	    && pdev->device == PCI_DEVICE_ID_STMICRO_USB_HOST)
 		;	/* ConneXT has no sbrn register */
+	else if (pdev->vendor == PCI_VENDOR_ID_HUAWEI
+			 && pdev->device == 0xa239)
+		;	/* HUAWEI Kunpeng920 USB EHCI has no sbrn register */
 	else
 		pci_read_config_byte(pdev, 0x60, &ehci->sbrn);
 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 14/27] ALSA: hda: Add quirk for ASUS Flow x13
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 13/27] USB:ehci:fix Kunpeng920 ehci hardware problem Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 15/27] ppp: Fix generating ppp unit id when ifname is not specified Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Luke D Jones, Takashi Iwai

From: Luke D Jones <luke@ljones.dev>

commit 739d0959fbed23838a96c48fbce01dd2f6fb2c5f upstream.

The ASUS GV301QH sound appears to work well with the quirk for
ALC294_FIXUP_ASUS_DUAL_SPK.

Signed-off-by: Luke D Jones <luke@ljones.dev>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210807025805.27321-1-luke@ljones.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 sound/pci/hda/patch_realtek.c |    1 +
 1 file changed, 1 insertion(+)

--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -8122,6 +8122,7 @@ static const struct snd_pci_quirk alc269
 	SND_PCI_QUIRK(0x1043, 0x16e3, "ASUS UX50", ALC269_FIXUP_STEREO_DMIC),
 	SND_PCI_QUIRK(0x1043, 0x1740, "ASUS UX430UA", ALC295_FIXUP_ASUS_DACS),
 	SND_PCI_QUIRK(0x1043, 0x17d1, "ASUS UX431FL", ALC294_FIXUP_ASUS_DUAL_SPK),
+	SND_PCI_QUIRK(0x1043, 0x1662, "ASUS GV301QH", ALC294_FIXUP_ASUS_DUAL_SPK),
 	SND_PCI_QUIRK(0x1043, 0x1881, "ASUS Zephyrus S/M", ALC294_FIXUP_ASUS_GX502_PINS),
 	SND_PCI_QUIRK(0x1043, 0x18b1, "Asus MJ401TA", ALC256_FIXUP_ASUS_HEADSET_MIC),
 	SND_PCI_QUIRK(0x1043, 0x18f1, "Asus FX505DT", ALC256_FIXUP_ASUS_HEADSET_MIC),



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 15/27] ppp: Fix generating ppp unit id when ifname is not specified
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 14/27] ALSA: hda: Add quirk for ASUS Flow x13 Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 16/27] ovl: prevent private clone if bind mount is not allowed Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Pali Rohár, David S. Miller

From: Pali Rohár <pali@kernel.org>

commit 3125f26c514826077f2a4490b75e9b1c7a644c42 upstream.

When registering new ppp interface via PPPIOCNEWUNIT ioctl then kernel has
to choose interface name as this ioctl API does not support specifying it.

Kernel in this case register new interface with name "ppp<id>" where <id>
is the ppp unit id, which can be obtained via PPPIOCGUNIT ioctl. This
applies also in the case when registering new ppp interface via rtnl
without supplying IFLA_IFNAME.

PPPIOCNEWUNIT ioctl allows to specify own ppp unit id which will kernel
assign to ppp interface, in case this ppp id is not already used by other
ppp interface.

In case user does not specify ppp unit id then kernel choose the first free
ppp unit id. This applies also for case when creating ppp interface via
rtnl method as it does not provide a way for specifying own ppp unit id.

If some network interface (does not have to be ppp) has name "ppp<id>"
with this first free ppp id then PPPIOCNEWUNIT ioctl or rtnl call fails.

And registering new ppp interface is not possible anymore, until interface
which holds conflicting name is renamed. Or when using rtnl method with
custom interface name in IFLA_IFNAME.

As list of allocated / used ppp unit ids is not possible to retrieve from
kernel to userspace, userspace has no idea what happens nor which interface
is doing this conflict.

So change the algorithm how ppp unit id is generated. And choose the first
number which is not neither used as ppp unit id nor in some network
interface with pattern "ppp<id>".

This issue can be simply reproduced by following pppd call when there is no
ppp interface registered and also no interface with name pattern "ppp<id>":

    pppd ifname ppp1 +ipv6 noip noauth nolock local nodetach pty "pppd +ipv6 noip noauth nolock local nodetach notty"

Or by creating the one ppp interface (which gets assigned ppp unit id 0),
renaming it to "ppp1" and then trying to create a new ppp interface (which
will always fails as next free ppp unit id is 1, but network interface with
name "ppp1" exists).

This patch fixes above described issue by generating new and new ppp unit
id until some non-conflicting id with network interfaces is generated.

Signed-off-by: Pali Rohár <pali@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ppp/ppp_generic.c |   19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -283,7 +283,7 @@ static struct channel *ppp_find_channel(
 static int ppp_connect_channel(struct channel *pch, int unit);
 static int ppp_disconnect_channel(struct channel *pch);
 static void ppp_destroy_channel(struct channel *pch);
-static int unit_get(struct idr *p, void *ptr);
+static int unit_get(struct idr *p, void *ptr, int min);
 static int unit_set(struct idr *p, void *ptr, int n);
 static void unit_put(struct idr *p, int n);
 static void *unit_find(struct idr *p, int n);
@@ -959,9 +959,20 @@ static int ppp_unit_register(struct ppp
 	mutex_lock(&pn->all_ppp_mutex);
 
 	if (unit < 0) {
-		ret = unit_get(&pn->units_idr, ppp);
+		ret = unit_get(&pn->units_idr, ppp, 0);
 		if (ret < 0)
 			goto err;
+		if (!ifname_is_set) {
+			while (1) {
+				snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ret);
+				if (!__dev_get_by_name(ppp->ppp_net, ppp->dev->name))
+					break;
+				unit_put(&pn->units_idr, ret);
+				ret = unit_get(&pn->units_idr, ppp, ret + 1);
+				if (ret < 0)
+					goto err;
+			}
+		}
 	} else {
 		/* Caller asked for a specific unit number. Fail with -EEXIST
 		 * if unavailable. For backward compatibility, return -EEXIST
@@ -3294,9 +3305,9 @@ static int unit_set(struct idr *p, void
 }
 
 /* get new free unit number and associate pointer with it */
-static int unit_get(struct idr *p, void *ptr)
+static int unit_get(struct idr *p, void *ptr, int min)
 {
-	return idr_alloc(p, ptr, 0, 0, GFP_KERNEL);
+	return idr_alloc(p, ptr, min, 0, GFP_KERNEL);
 }
 
 /* put unit number back to a pool */



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 16/27] ovl: prevent private clone if bind mount is not allowed
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 15/27] ppp: Fix generating ppp unit id when ifname is not specified Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 17/27] btrfs: make qgroup_free_reserved_data take btrfs_inode Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Alois Wohlschlager, Miklos Szeredi

From: Miklos Szeredi <mszeredi@redhat.com>

commit 427215d85e8d1476da1a86b8d67aceb485eb3631 upstream.

Add the following checks from __do_loopback() to clone_private_mount() as
well:

 - verify that the mount is in the current namespace

 - verify that there are no locked children

Reported-by: Alois Wohlschlager <alois1@gmx-topmail.de>
Fixes: c771d683a62e ("vfs: introduce clone_private_mount()")
Cc: <stable@vger.kernel.org> # v3.18
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/namespace.c |   42 ++++++++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 14 deletions(-)

--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1861,6 +1861,20 @@ void drop_collected_mounts(struct vfsmou
 	namespace_unlock();
 }
 
+static bool has_locked_children(struct mount *mnt, struct dentry *dentry)
+{
+	struct mount *child;
+
+	list_for_each_entry(child, &mnt->mnt_mounts, mnt_child) {
+		if (!is_subdir(child->mnt_mountpoint, dentry))
+			continue;
+
+		if (child->mnt.mnt_flags & MNT_LOCKED)
+			return true;
+	}
+	return false;
+}
+
 /**
  * clone_private_mount - create a private clone of a path
  *
@@ -1875,14 +1889,27 @@ struct vfsmount *clone_private_mount(con
 	struct mount *old_mnt = real_mount(path->mnt);
 	struct mount *new_mnt;
 
+	down_read(&namespace_sem);
 	if (IS_MNT_UNBINDABLE(old_mnt))
-		return ERR_PTR(-EINVAL);
+		goto invalid;
+
+	if (!check_mnt(old_mnt))
+		goto invalid;
+
+	if (has_locked_children(old_mnt, path->dentry))
+		goto invalid;
 
 	new_mnt = clone_mnt(old_mnt, path->dentry, CL_PRIVATE);
+	up_read(&namespace_sem);
+
 	if (IS_ERR(new_mnt))
 		return ERR_CAST(new_mnt);
 
 	return &new_mnt->mnt;
+
+invalid:
+	up_read(&namespace_sem);
+	return ERR_PTR(-EINVAL);
 }
 EXPORT_SYMBOL_GPL(clone_private_mount);
 
@@ -2234,19 +2261,6 @@ static int do_change_type(struct path *p
 	return err;
 }
 
-static bool has_locked_children(struct mount *mnt, struct dentry *dentry)
-{
-	struct mount *child;
-	list_for_each_entry(child, &mnt->mnt_mounts, mnt_child) {
-		if (!is_subdir(child->mnt_mountpoint, dentry))
-			continue;
-
-		if (child->mnt.mnt_flags & MNT_LOCKED)
-			return true;
-	}
-	return false;
-}
-
 static struct mount *__do_loopback(struct path *old_path, int recurse)
 {
 	struct mount *mnt = ERR_PTR(-EINVAL), *old = real_mount(old_path->mnt);



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 17/27] btrfs: make qgroup_free_reserved_data take btrfs_inode
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 16/27] ovl: prevent private clone if bind mount is not allowed Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 18/27] btrfs: make btrfs_qgroup_reserve_data " Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Nikolay Borisov, David Sterba, Anand Jain

From: Nikolay Borisov <nborisov@suse.com>

commit df2cfd131fd33dbef1ce33be8b332b1f3d645f35 upstream

It only uses btrfs_inode so can just as easily take it as an argument.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/btrfs/qgroup.c |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3481,10 +3481,10 @@ cleanup:
 }
 
 /* Free ranges specified by @reserved, normally in error path */
-static int qgroup_free_reserved_data(struct inode *inode,
+static int qgroup_free_reserved_data(struct btrfs_inode *inode,
 			struct extent_changeset *reserved, u64 start, u64 len)
 {
-	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct btrfs_root *root = inode->root;
 	struct ulist_node *unode;
 	struct ulist_iterator uiter;
 	struct extent_changeset changeset;
@@ -3520,8 +3520,8 @@ static int qgroup_free_reserved_data(str
 		 * EXTENT_QGROUP_RESERVED, we won't double free.
 		 * So not need to rush.
 		 */
-		ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree,
-				free_start, free_start + free_len - 1,
+		ret = clear_record_extent_bits(&inode->io_tree, free_start,
+				free_start + free_len - 1,
 				EXTENT_QGROUP_RESERVED, &changeset);
 		if (ret < 0)
 			goto out;
@@ -3550,7 +3550,8 @@ static int __btrfs_qgroup_release_data(s
 	/* In release case, we shouldn't have @reserved */
 	WARN_ON(!free && reserved);
 	if (free && reserved)
-		return qgroup_free_reserved_data(inode, reserved, start, len);
+		return qgroup_free_reserved_data(BTRFS_I(inode), reserved,
+						 start, len);
 	extent_changeset_init(&changeset);
 	ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, start, 
 			start + len -1, EXTENT_QGROUP_RESERVED, &changeset);



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 18/27] btrfs: make btrfs_qgroup_reserve_data take btrfs_inode
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 17/27] btrfs: make qgroup_free_reserved_data take btrfs_inode Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 19/27] btrfs: qgroup: allow to unreserve range without releasing other ranges Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Nikolay Borisov, David Sterba, Anand Jain

From: Nikolay Borisov <nborisov@suse.com>

commit 7661a3e033ab782366e0e1f4b6aad0df3555fcbd upstream

There's only a single use of vfs_inode in a tracepoint so let's take
btrfs_inode directly.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/btrfs/delalloc-space.c |    2 +-
 fs/btrfs/file.c           |    7 ++++---
 fs/btrfs/qgroup.c         |   10 +++++-----
 fs/btrfs/qgroup.h         |    2 +-
 4 files changed, 11 insertions(+), 10 deletions(-)

--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -151,7 +151,7 @@ int btrfs_check_data_free_space(struct i
 		return ret;
 
 	/* Use new btrfs_qgroup_reserve_data to reserve precious data space. */
-	ret = btrfs_qgroup_reserve_data(inode, reserved, start, len);
+	ret = btrfs_qgroup_reserve_data(BTRFS_I(inode), reserved, start, len);
 	if (ret < 0)
 		btrfs_free_reserved_data_space_noquota(inode, start, len);
 	else
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -3149,7 +3149,7 @@ reserve_space:
 						  &cached_state);
 		if (ret)
 			goto out;
-		ret = btrfs_qgroup_reserve_data(inode, &data_reserved,
+		ret = btrfs_qgroup_reserve_data(BTRFS_I(inode), &data_reserved,
 						alloc_start, bytes_to_reserve);
 		if (ret) {
 			unlock_extent_cached(&BTRFS_I(inode)->io_tree, lockstart,
@@ -3322,8 +3322,9 @@ static long btrfs_fallocate(struct file
 				free_extent_map(em);
 				break;
 			}
-			ret = btrfs_qgroup_reserve_data(inode, &data_reserved,
-					cur_offset, last_byte - cur_offset);
+			ret = btrfs_qgroup_reserve_data(BTRFS_I(inode),
+					&data_reserved, cur_offset,
+					last_byte - cur_offset);
 			if (ret < 0) {
 				cur_offset = last_byte;
 				free_extent_map(em);
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3425,11 +3425,11 @@ btrfs_qgroup_rescan_resume(struct btrfs_
  *       same @reserved, caller must ensure when error happens it's OK
  *       to free *ALL* reserved space.
  */
-int btrfs_qgroup_reserve_data(struct inode *inode,
+int btrfs_qgroup_reserve_data(struct btrfs_inode *inode,
 			struct extent_changeset **reserved_ret, u64 start,
 			u64 len)
 {
-	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct btrfs_root *root = inode->root;
 	struct ulist_node *unode;
 	struct ulist_iterator uiter;
 	struct extent_changeset *reserved;
@@ -3452,12 +3452,12 @@ int btrfs_qgroup_reserve_data(struct ino
 	reserved = *reserved_ret;
 	/* Record already reserved space */
 	orig_reserved = reserved->bytes_changed;
-	ret = set_record_extent_bits(&BTRFS_I(inode)->io_tree, start,
+	ret = set_record_extent_bits(&inode->io_tree, start,
 			start + len -1, EXTENT_QGROUP_RESERVED, reserved);
 
 	/* Newly reserved space */
 	to_reserve = reserved->bytes_changed - orig_reserved;
-	trace_btrfs_qgroup_reserve_data(inode, start, len,
+	trace_btrfs_qgroup_reserve_data(&inode->vfs_inode, start, len,
 					to_reserve, QGROUP_RESERVE);
 	if (ret < 0)
 		goto cleanup;
@@ -3471,7 +3471,7 @@ cleanup:
 	/* cleanup *ALL* already reserved ranges */
 	ULIST_ITER_INIT(&uiter);
 	while ((unode = ulist_next(&reserved->range_changed, &uiter)))
-		clear_extent_bit(&BTRFS_I(inode)->io_tree, unode->val,
+		clear_extent_bit(&inode->io_tree, unode->val,
 				 unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL);
 	/* Also free data bytes of already reserved one */
 	btrfs_qgroup_free_refroot(root->fs_info, root->root_key.objectid,
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -344,7 +344,7 @@ int btrfs_verify_qgroup_counts(struct bt
 #endif
 
 /* New io_tree based accurate qgroup reserve API */
-int btrfs_qgroup_reserve_data(struct inode *inode,
+int btrfs_qgroup_reserve_data(struct btrfs_inode *inode,
 			struct extent_changeset **reserved, u64 start, u64 len);
 int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len);
 int btrfs_qgroup_free_data(struct inode *inode,



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 19/27] btrfs: qgroup: allow to unreserve range without releasing other ranges
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 18/27] btrfs: make btrfs_qgroup_reserve_data " Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 20/27] btrfs: qgroup: try to flush qgroup space when we get -EDQUOT Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Josef Bacik, Qu Wenruo, David Sterba,
	Anand Jain

From: Qu Wenruo <wqu@suse.com>

commit 263da812e87bac4098a4778efaa32c54275641db upstream

[PROBLEM]
Before this patch, when btrfs_qgroup_reserve_data() fails, we free all
reserved space of the changeset.

For example:
	ret = btrfs_qgroup_reserve_data(inode, changeset, 0, SZ_1M);
	ret = btrfs_qgroup_reserve_data(inode, changeset, SZ_1M, SZ_1M);
	ret = btrfs_qgroup_reserve_data(inode, changeset, SZ_2M, SZ_1M);

If the last btrfs_qgroup_reserve_data() failed, it will release the
entire [0, 3M) range.

This behavior is kind of OK for now, as when we hit -EDQUOT, we normally
go error handling and need to release all reserved ranges anyway.

But this also means the following call is not possible:

	ret = btrfs_qgroup_reserve_data();
	if (ret == -EDQUOT) {
		/* Do something to free some qgroup space */
		ret = btrfs_qgroup_reserve_data();
	}

As if the first btrfs_qgroup_reserve_data() fails, it will free all
reserved qgroup space.

[CAUSE]
This is because we release all reserved ranges when
btrfs_qgroup_reserve_data() fails.

[FIX]
This patch will implement a new function, qgroup_unreserve_range(), to
iterate through the ulist nodes, to find any nodes in the failure range,
and remove the EXTENT_QGROUP_RESERVED bits from the io_tree, and
decrease the extent_changeset::bytes_changed, so that we can revert to
previous state.

This allows later patches to retry btrfs_qgroup_reserve_data() if EDQUOT
happens.

Suggested-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/btrfs/qgroup.c |   92 +++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 77 insertions(+), 15 deletions(-)

--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3411,6 +3411,73 @@ btrfs_qgroup_rescan_resume(struct btrfs_
 	}
 }
 
+#define rbtree_iterate_from_safe(node, next, start)				\
+       for (node = start; node && ({ next = rb_next(node); 1;}); node = next)
+
+static int qgroup_unreserve_range(struct btrfs_inode *inode,
+				  struct extent_changeset *reserved, u64 start,
+				  u64 len)
+{
+	struct rb_node *node;
+	struct rb_node *next;
+	struct ulist_node *entry = NULL;
+	int ret = 0;
+
+	node = reserved->range_changed.root.rb_node;
+	while (node) {
+		entry = rb_entry(node, struct ulist_node, rb_node);
+		if (entry->val < start)
+			node = node->rb_right;
+		else if (entry)
+			node = node->rb_left;
+		else
+			break;
+	}
+
+	/* Empty changeset */
+	if (!entry)
+		return 0;
+
+	if (entry->val > start && rb_prev(&entry->rb_node))
+		entry = rb_entry(rb_prev(&entry->rb_node), struct ulist_node,
+				 rb_node);
+
+	rbtree_iterate_from_safe(node, next, &entry->rb_node) {
+		u64 entry_start;
+		u64 entry_end;
+		u64 entry_len;
+		int clear_ret;
+
+		entry = rb_entry(node, struct ulist_node, rb_node);
+		entry_start = entry->val;
+		entry_end = entry->aux;
+		entry_len = entry_end - entry_start + 1;
+
+		if (entry_start >= start + len)
+			break;
+		if (entry_start + entry_len <= start)
+			continue;
+		/*
+		 * Now the entry is in [start, start + len), revert the
+		 * EXTENT_QGROUP_RESERVED bit.
+		 */
+		clear_ret = clear_extent_bits(&inode->io_tree, entry_start,
+					      entry_end, EXTENT_QGROUP_RESERVED);
+		if (!ret && clear_ret < 0)
+			ret = clear_ret;
+
+		ulist_del(&reserved->range_changed, entry->val, entry->aux);
+		if (likely(reserved->bytes_changed >= entry_len)) {
+			reserved->bytes_changed -= entry_len;
+		} else {
+			WARN_ON(1);
+			reserved->bytes_changed = 0;
+		}
+	}
+
+	return ret;
+}
+
 /*
  * Reserve qgroup space for range [start, start + len).
  *
@@ -3421,18 +3488,14 @@ btrfs_qgroup_rescan_resume(struct btrfs_
  * Return <0 for error (including -EQUOT)
  *
  * NOTE: this function may sleep for memory allocation.
- *       if btrfs_qgroup_reserve_data() is called multiple times with
- *       same @reserved, caller must ensure when error happens it's OK
- *       to free *ALL* reserved space.
  */
 int btrfs_qgroup_reserve_data(struct btrfs_inode *inode,
 			struct extent_changeset **reserved_ret, u64 start,
 			u64 len)
 {
 	struct btrfs_root *root = inode->root;
-	struct ulist_node *unode;
-	struct ulist_iterator uiter;
 	struct extent_changeset *reserved;
+	bool new_reserved = false;
 	u64 orig_reserved;
 	u64 to_reserve;
 	int ret;
@@ -3445,6 +3508,7 @@ int btrfs_qgroup_reserve_data(struct btr
 	if (WARN_ON(!reserved_ret))
 		return -EINVAL;
 	if (!*reserved_ret) {
+		new_reserved = true;
 		*reserved_ret = extent_changeset_alloc();
 		if (!*reserved_ret)
 			return -ENOMEM;
@@ -3460,7 +3524,7 @@ int btrfs_qgroup_reserve_data(struct btr
 	trace_btrfs_qgroup_reserve_data(&inode->vfs_inode, start, len,
 					to_reserve, QGROUP_RESERVE);
 	if (ret < 0)
-		goto cleanup;
+		goto out;
 	ret = qgroup_reserve(root, to_reserve, true, BTRFS_QGROUP_RSV_DATA);
 	if (ret < 0)
 		goto cleanup;
@@ -3468,15 +3532,13 @@ int btrfs_qgroup_reserve_data(struct btr
 	return ret;
 
 cleanup:
-	/* cleanup *ALL* already reserved ranges */
-	ULIST_ITER_INIT(&uiter);
-	while ((unode = ulist_next(&reserved->range_changed, &uiter)))
-		clear_extent_bit(&inode->io_tree, unode->val,
-				 unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL);
-	/* Also free data bytes of already reserved one */
-	btrfs_qgroup_free_refroot(root->fs_info, root->root_key.objectid,
-				  orig_reserved, BTRFS_QGROUP_RSV_DATA);
-	extent_changeset_release(reserved);
+	qgroup_unreserve_range(inode, reserved, start, len);
+out:
+	if (new_reserved) {
+		extent_changeset_release(reserved);
+		kfree(reserved);
+		*reserved_ret = NULL;
+	}
 	return ret;
 }
 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 20/27] btrfs: qgroup: try to flush qgroup space when we get -EDQUOT
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 19/27] btrfs: qgroup: allow to unreserve range without releasing other ranges Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 21/27] btrfs: transaction: Cleanup unused TRANS_STATE_BLOCKED Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Josef Bacik, Qu Wenruo, David Sterba,
	Anand Jain

From: Qu Wenruo <wqu@suse.com>

commit c53e9653605dbf708f5be02902de51831be4b009 upstream

[PROBLEM]
There are known problem related to how btrfs handles qgroup reserved
space.  One of the most obvious case is the the test case btrfs/153,
which do fallocate, then write into the preallocated range.

#  btrfs/153 1s ... - output mismatch (see xfstests-dev/results//btrfs/153.out.bad)
#      --- tests/btrfs/153.out     2019-10-22 15:18:14.068965341 +0800
#      +++ xfstests-dev/results//btrfs/153.out.bad      2020-07-01 20:24:40.730000089 +0800
#      @@ -1,2 +1,5 @@
#       QA output created by 153
#      +pwrite: Disk quota exceeded
#      +/mnt/scratch/testfile2: Disk quota exceeded
#      +/mnt/scratch/testfile2: Disk quota exceeded
#       Silence is golden
#      ...
#      (Run 'diff -u xfstests-dev/tests/btrfs/153.out xfstests-dev/results//btrfs/153.out.bad'  to see the entire diff)

[CAUSE]
Since commit c6887cd11149 ("Btrfs: don't do nocow check unless we have to"),
we always reserve space no matter if it's COW or not.

Such behavior change is mostly for performance, and reverting it is not
a good idea anyway.

For preallcoated extent, we reserve qgroup data space for it already,
and since we also reserve data space for qgroup at buffered write time,
it needs twice the space for us to write into preallocated space.

This leads to the -EDQUOT in buffered write routine.

And we can't follow the same solution, unlike data/meta space check,
qgroup reserved space is shared between data/metadata.
The EDQUOT can happen at the metadata reservation, so doing NODATACOW
check after qgroup reservation failure is not a solution.

[FIX]
To solve the problem, we don't return -EDQUOT directly, but every time
we got a -EDQUOT, we try to flush qgroup space:

- Flush all inodes of the root
  NODATACOW writes will free the qgroup reserved at run_dealloc_range().
  However we don't have the infrastructure to only flush NODATACOW
  inodes, here we flush all inodes anyway.

- Wait for ordered extents
  This would convert the preallocated metadata space into per-trans
  metadata, which can be freed in later transaction commit.

- Commit transaction
  This will free all per-trans metadata space.

Also we don't want to trigger flush multiple times, so here we introduce
a per-root wait list and a new root status, to ensure only one thread
starts the flushing.

Fixes: c6887cd11149 ("Btrfs: don't do nocow check unless we have to")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/btrfs/ctree.h   |    3 +
 fs/btrfs/disk-io.c |    1 
 fs/btrfs/qgroup.c  |  100 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 96 insertions(+), 8 deletions(-)

--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -945,6 +945,8 @@ enum {
 	BTRFS_ROOT_DEAD_TREE,
 	/* The root has a log tree. Used only for subvolume roots. */
 	BTRFS_ROOT_HAS_LOG_TREE,
+	/* Qgroup flushing is in progress */
+	BTRFS_ROOT_QGROUP_FLUSHING,
 };
 
 /*
@@ -1097,6 +1099,7 @@ struct btrfs_root {
 	spinlock_t qgroup_meta_rsv_lock;
 	u64 qgroup_meta_rsv_pertrans;
 	u64 qgroup_meta_rsv_prealloc;
+	wait_queue_head_t qgroup_flush_wait;
 
 	/* Number of active swapfiles */
 	atomic_t nr_swapfiles;
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1154,6 +1154,7 @@ static void __setup_root(struct btrfs_ro
 	mutex_init(&root->log_mutex);
 	mutex_init(&root->ordered_extent_mutex);
 	mutex_init(&root->delalloc_mutex);
+	init_waitqueue_head(&root->qgroup_flush_wait);
 	init_waitqueue_head(&root->log_writer_wait);
 	init_waitqueue_head(&root->log_commit_wait[0]);
 	init_waitqueue_head(&root->log_commit_wait[1]);
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3479,17 +3479,58 @@ static int qgroup_unreserve_range(struct
 }
 
 /*
- * Reserve qgroup space for range [start, start + len).
+ * Try to free some space for qgroup.
  *
- * This function will either reserve space from related qgroups or doing
- * nothing if the range is already reserved.
+ * For qgroup, there are only 3 ways to free qgroup space:
+ * - Flush nodatacow write
+ *   Any nodatacow write will free its reserved data space at run_delalloc_range().
+ *   In theory, we should only flush nodatacow inodes, but it's not yet
+ *   possible, so we need to flush the whole root.
  *
- * Return 0 for successful reserve
- * Return <0 for error (including -EQUOT)
+ * - Wait for ordered extents
+ *   When ordered extents are finished, their reserved metadata is finally
+ *   converted to per_trans status, which can be freed by later commit
+ *   transaction.
  *
- * NOTE: this function may sleep for memory allocation.
+ * - Commit transaction
+ *   This would free the meta_per_trans space.
+ *   In theory this shouldn't provide much space, but any more qgroup space
+ *   is needed.
  */
-int btrfs_qgroup_reserve_data(struct btrfs_inode *inode,
+static int try_flush_qgroup(struct btrfs_root *root)
+{
+	struct btrfs_trans_handle *trans;
+	int ret;
+
+	/*
+	 * We don't want to run flush again and again, so if there is a running
+	 * one, we won't try to start a new flush, but exit directly.
+	 */
+	if (test_and_set_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state)) {
+		wait_event(root->qgroup_flush_wait,
+			!test_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state));
+		return 0;
+	}
+
+	ret = btrfs_start_delalloc_snapshot(root);
+	if (ret < 0)
+		goto out;
+	btrfs_wait_ordered_extents(root, U64_MAX, 0, (u64)-1);
+
+	trans = btrfs_join_transaction(root);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		goto out;
+	}
+
+	ret = btrfs_commit_transaction(trans);
+out:
+	clear_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state);
+	wake_up(&root->qgroup_flush_wait);
+	return ret;
+}
+
+static int qgroup_reserve_data(struct btrfs_inode *inode,
 			struct extent_changeset **reserved_ret, u64 start,
 			u64 len)
 {
@@ -3542,6 +3583,34 @@ out:
 	return ret;
 }
 
+/*
+ * Reserve qgroup space for range [start, start + len).
+ *
+ * This function will either reserve space from related qgroups or do nothing
+ * if the range is already reserved.
+ *
+ * Return 0 for successful reservation
+ * Return <0 for error (including -EQUOT)
+ *
+ * NOTE: This function may sleep for memory allocation, dirty page flushing and
+ *	 commit transaction. So caller should not hold any dirty page locked.
+ */
+int btrfs_qgroup_reserve_data(struct btrfs_inode *inode,
+			struct extent_changeset **reserved_ret, u64 start,
+			u64 len)
+{
+	int ret;
+
+	ret = qgroup_reserve_data(inode, reserved_ret, start, len);
+	if (ret <= 0 && ret != -EDQUOT)
+		return ret;
+
+	ret = try_flush_qgroup(inode->root);
+	if (ret < 0)
+		return ret;
+	return qgroup_reserve_data(inode, reserved_ret, start, len);
+}
+
 /* Free ranges specified by @reserved, normally in error path */
 static int qgroup_free_reserved_data(struct btrfs_inode *inode,
 			struct extent_changeset *reserved, u64 start, u64 len)
@@ -3712,7 +3781,7 @@ static int sub_root_meta_rsv(struct btrf
 	return num_bytes;
 }
 
-int __btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
+static int qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
 				enum btrfs_qgroup_rsv_type type, bool enforce)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
@@ -3739,6 +3808,21 @@ int __btrfs_qgroup_reserve_meta(struct b
 	return ret;
 }
 
+int __btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
+				enum btrfs_qgroup_rsv_type type, bool enforce)
+{
+	int ret;
+
+	ret = qgroup_reserve_meta(root, num_bytes, type, enforce);
+	if (ret <= 0 && ret != -EDQUOT)
+		return ret;
+
+	ret = try_flush_qgroup(root);
+	if (ret < 0)
+		return ret;
+	return qgroup_reserve_meta(root, num_bytes, type, enforce);
+}
+
 void btrfs_qgroup_free_meta_all_pertrans(struct btrfs_root *root)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 21/27] btrfs: transaction: Cleanup unused TRANS_STATE_BLOCKED
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 20/27] btrfs: qgroup: try to flush qgroup space when we get -EDQUOT Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 22/27] btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve retry-after-EDQUOT Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Qu Wenruo, David Sterba, Anand Jain

From: Qu Wenruo <wqu@suse.com>

commit 3296bf562443a8ca35aaad959a76a49e9b412760 upstream

The state was introduced in commit 4a9d8bdee368 ("Btrfs: make the state
of the transaction more readable"), then in commit 302167c50b32
("btrfs: don't end the transaction for delayed refs in throttle") the
state is completely removed.

So we can just clean up the state since it's only compared but never
set.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/btrfs/disk-io.c     |    2 +-
 fs/btrfs/transaction.c |   15 +++------------
 fs/btrfs/transaction.h |    1 -
 3 files changed, 4 insertions(+), 14 deletions(-)

--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1748,7 +1748,7 @@ static int transaction_kthread(void *arg
 		}
 
 		now = ktime_get_seconds();
-		if (cur->state < TRANS_STATE_BLOCKED &&
+		if (cur->state < TRANS_STATE_COMMIT_START &&
 		    !test_bit(BTRFS_FS_NEED_ASYNC_COMMIT, &fs_info->flags) &&
 		    (now < cur->start_time ||
 		     now - cur->start_time < fs_info->commit_interval)) {
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -27,7 +27,6 @@
 
 static const unsigned int btrfs_blocked_trans_types[TRANS_STATE_MAX] = {
 	[TRANS_STATE_RUNNING]		= 0U,
-	[TRANS_STATE_BLOCKED]		=  __TRANS_START,
 	[TRANS_STATE_COMMIT_START]	= (__TRANS_START | __TRANS_ATTACH),
 	[TRANS_STATE_COMMIT_DOING]	= (__TRANS_START |
 					   __TRANS_ATTACH |
@@ -388,7 +387,7 @@ int btrfs_record_root_in_trans(struct bt
 
 static inline int is_transaction_blocked(struct btrfs_transaction *trans)
 {
-	return (trans->state >= TRANS_STATE_BLOCKED &&
+	return (trans->state >= TRANS_STATE_COMMIT_START &&
 		trans->state < TRANS_STATE_UNBLOCKED &&
 		!TRANS_ABORTED(trans));
 }
@@ -580,7 +579,7 @@ again:
 	INIT_LIST_HEAD(&h->new_bgs);
 
 	smp_mb();
-	if (cur_trans->state >= TRANS_STATE_BLOCKED &&
+	if (cur_trans->state >= TRANS_STATE_COMMIT_START &&
 	    may_wait_transaction(fs_info, type)) {
 		current->journal_info = h;
 		btrfs_commit_transaction(h);
@@ -797,7 +796,7 @@ int btrfs_should_end_transaction(struct
 	struct btrfs_transaction *cur_trans = trans->transaction;
 
 	smp_mb();
-	if (cur_trans->state >= TRANS_STATE_BLOCKED ||
+	if (cur_trans->state >= TRANS_STATE_COMMIT_START ||
 	    cur_trans->delayed_refs.flushing)
 		return 1;
 
@@ -830,7 +829,6 @@ static int __btrfs_end_transaction(struc
 {
 	struct btrfs_fs_info *info = trans->fs_info;
 	struct btrfs_transaction *cur_trans = trans->transaction;
-	int lock = (trans->type != TRANS_JOIN_NOLOCK);
 	int err = 0;
 
 	if (refcount_read(&trans->use_count) > 1) {
@@ -846,13 +844,6 @@ static int __btrfs_end_transaction(struc
 
 	btrfs_trans_release_chunk_metadata(trans);
 
-	if (lock && READ_ONCE(cur_trans->state) == TRANS_STATE_BLOCKED) {
-		if (throttle)
-			return btrfs_commit_transaction(trans);
-		else
-			wake_up_process(info->transaction_kthread);
-	}
-
 	if (trans->type & __TRANS_FREEZABLE)
 		sb_end_intwrite(info->sb);
 
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -13,7 +13,6 @@
 
 enum btrfs_trans_state {
 	TRANS_STATE_RUNNING,
-	TRANS_STATE_BLOCKED,
 	TRANS_STATE_COMMIT_START,
 	TRANS_STATE_COMMIT_DOING,
 	TRANS_STATE_UNBLOCKED,



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 22/27] btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve retry-after-EDQUOT
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 21/27] btrfs: transaction: Cleanup unused TRANS_STATE_BLOCKED Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 23/27] btrfs: fix lockdep splat when enabling and disabling qgroups Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Josef Bacik, Qu Wenruo, David Sterba,
	Anand Jain

From: Qu Wenruo <wqu@suse.com>

commit adca4d945c8dca28a85df45c5b117e6dac2e77f1 upstream

commit a514d63882c3 ("btrfs: qgroup: Commit transaction in advance to
reduce early EDQUOT") tries to reduce the early EDQUOT problems by
checking the qgroup free against threshold and tries to wake up commit
kthread to free some space.

The problem of that mechanism is, it can only free qgroup per-trans
metadata space, can't do anything to data, nor prealloc qgroup space.

Now since we have the ability to flush qgroup space, and implemented
retry-after-EDQUOT behavior, such mechanism can be completely replaced.

So this patch will cleanup such mechanism in favor of
retry-after-EDQUOT.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/btrfs/ctree.h       |    5 -----
 fs/btrfs/disk-io.c     |    1 -
 fs/btrfs/qgroup.c      |   43 ++-----------------------------------------
 fs/btrfs/transaction.c |    1 -
 fs/btrfs/transaction.h |   14 --------------
 5 files changed, 2 insertions(+), 62 deletions(-)

--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -505,11 +505,6 @@ enum {
 	 */
 	BTRFS_FS_EXCL_OP,
 	/*
-	 * To info transaction_kthread we need an immediate commit so it
-	 * doesn't need to wait for commit_interval
-	 */
-	BTRFS_FS_NEED_ASYNC_COMMIT,
-	/*
 	 * Indicate that balance has been set up from the ioctl and is in the
 	 * main phase. The fs_info::balance_ctl is initialized.
 	 * Set and cleared while holding fs_info::balance_mutex.
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1749,7 +1749,6 @@ static int transaction_kthread(void *arg
 
 		now = ktime_get_seconds();
 		if (cur->state < TRANS_STATE_COMMIT_START &&
-		    !test_bit(BTRFS_FS_NEED_ASYNC_COMMIT, &fs_info->flags) &&
 		    (now < cur->start_time ||
 		     now - cur->start_time < fs_info->commit_interval)) {
 			spin_unlock(&fs_info->trans_lock);
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -11,7 +11,6 @@
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 #include <linux/btrfs.h>
-#include <linux/sizes.h>
 
 #include "ctree.h"
 #include "transaction.h"
@@ -2840,20 +2839,8 @@ out:
 	return ret;
 }
 
-/*
- * Two limits to commit transaction in advance.
- *
- * For RATIO, it will be 1/RATIO of the remaining limit as threshold.
- * For SIZE, it will be in byte unit as threshold.
- */
-#define QGROUP_FREE_RATIO		32
-#define QGROUP_FREE_SIZE		SZ_32M
-static bool qgroup_check_limits(struct btrfs_fs_info *fs_info,
-				const struct btrfs_qgroup *qg, u64 num_bytes)
+static bool qgroup_check_limits(const struct btrfs_qgroup *qg, u64 num_bytes)
 {
-	u64 free;
-	u64 threshold;
-
 	if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
 	    qgroup_rsv_total(qg) + (s64)qg->rfer + num_bytes > qg->max_rfer)
 		return false;
@@ -2862,32 +2849,6 @@ static bool qgroup_check_limits(struct b
 	    qgroup_rsv_total(qg) + (s64)qg->excl + num_bytes > qg->max_excl)
 		return false;
 
-	/*
-	 * Even if we passed the check, it's better to check if reservation
-	 * for meta_pertrans is pushing us near limit.
-	 * If there is too much pertrans reservation or it's near the limit,
-	 * let's try commit transaction to free some, using transaction_kthread
-	 */
-	if ((qg->lim_flags & (BTRFS_QGROUP_LIMIT_MAX_RFER |
-			      BTRFS_QGROUP_LIMIT_MAX_EXCL))) {
-		if (qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) {
-			free = qg->max_excl - qgroup_rsv_total(qg) - qg->excl;
-			threshold = min_t(u64, qg->max_excl / QGROUP_FREE_RATIO,
-					  QGROUP_FREE_SIZE);
-		} else {
-			free = qg->max_rfer - qgroup_rsv_total(qg) - qg->rfer;
-			threshold = min_t(u64, qg->max_rfer / QGROUP_FREE_RATIO,
-					  QGROUP_FREE_SIZE);
-		}
-
-		/*
-		 * Use transaction_kthread to commit transaction, so we no
-		 * longer need to bother nested transaction nor lock context.
-		 */
-		if (free < threshold)
-			btrfs_commit_transaction_locksafe(fs_info);
-	}
-
 	return true;
 }
 
@@ -2937,7 +2898,7 @@ static int qgroup_reserve(struct btrfs_r
 
 		qg = unode_aux_to_qgroup(unode);
 
-		if (enforce && !qgroup_check_limits(fs_info, qg, num_bytes)) {
+		if (enforce && !qgroup_check_limits(qg, num_bytes)) {
 			ret = -EDQUOT;
 			goto out;
 		}
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2297,7 +2297,6 @@ int btrfs_commit_transaction(struct btrf
 	 */
 	cur_trans->state = TRANS_STATE_COMPLETED;
 	wake_up(&cur_trans->commit_wait);
-	clear_bit(BTRFS_FS_NEED_ASYNC_COMMIT, &fs_info->flags);
 
 	spin_lock(&fs_info->trans_lock);
 	list_del_init(&cur_trans->list);
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -207,20 +207,6 @@ int btrfs_clean_one_deleted_snapshot(str
 int btrfs_commit_transaction(struct btrfs_trans_handle *trans);
 int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans,
 				   int wait_for_unblock);
-
-/*
- * Try to commit transaction asynchronously, so this is safe to call
- * even holding a spinlock.
- *
- * It's done by informing transaction_kthread to commit transaction without
- * waiting for commit interval.
- */
-static inline void btrfs_commit_transaction_locksafe(
-		struct btrfs_fs_info *fs_info)
-{
-	set_bit(BTRFS_FS_NEED_ASYNC_COMMIT, &fs_info->flags);
-	wake_up_process(fs_info->transaction_kthread);
-}
 int btrfs_end_transaction_throttle(struct btrfs_trans_handle *trans);
 int btrfs_should_end_transaction(struct btrfs_trans_handle *trans);
 void btrfs_throttle(struct btrfs_fs_info *fs_info);



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 23/27] btrfs: fix lockdep splat when enabling and disabling qgroups
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 22/27] btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve retry-after-EDQUOT Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 24/27] net: xilinx_emaclite: Do not print real IOMEM pointer Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Filipe Manana, David Sterba, Anand Jain

From: Filipe Manana <fdmanana@suse.com>

commit a855fbe69229078cd8aecd8974fb996a5ca651e6 upstream

When running test case btrfs/017 from fstests, lockdep reported the
following splat:

  [ 1297.067385] ======================================================
  [ 1297.067708] WARNING: possible circular locking dependency detected
  [ 1297.068022] 5.10.0-rc4-btrfs-next-73 #1 Not tainted
  [ 1297.068322] ------------------------------------------------------
  [ 1297.068629] btrfs/189080 is trying to acquire lock:
  [ 1297.068929] ffff9f2725731690 (sb_internal#2){.+.+}-{0:0}, at: btrfs_quota_enable+0xaf/0xa70 [btrfs]
  [ 1297.069274]
		 but task is already holding lock:
  [ 1297.069868] ffff9f2702b61a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0xa70 [btrfs]
  [ 1297.070219]
		 which lock already depends on the new lock.

  [ 1297.071131]
		 the existing dependency chain (in reverse order) is:
  [ 1297.071721]
		 -> #1 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}:
  [ 1297.072375]        lock_acquire+0xd8/0x490
  [ 1297.072710]        __mutex_lock+0xa3/0xb30
  [ 1297.073061]        btrfs_qgroup_inherit+0x59/0x6a0 [btrfs]
  [ 1297.073421]        create_subvol+0x194/0x990 [btrfs]
  [ 1297.073780]        btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
  [ 1297.074133]        __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
  [ 1297.074498]        btrfs_ioctl_snap_create+0x58/0x80 [btrfs]
  [ 1297.074872]        btrfs_ioctl+0x1a90/0x36f0 [btrfs]
  [ 1297.075245]        __x64_sys_ioctl+0x83/0xb0
  [ 1297.075617]        do_syscall_64+0x33/0x80
  [ 1297.075993]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 1297.076380]
		 -> #0 (sb_internal#2){.+.+}-{0:0}:
  [ 1297.077166]        check_prev_add+0x91/0xc60
  [ 1297.077572]        __lock_acquire+0x1740/0x3110
  [ 1297.077984]        lock_acquire+0xd8/0x490
  [ 1297.078411]        start_transaction+0x3c5/0x760 [btrfs]
  [ 1297.078853]        btrfs_quota_enable+0xaf/0xa70 [btrfs]
  [ 1297.079323]        btrfs_ioctl+0x2c60/0x36f0 [btrfs]
  [ 1297.079789]        __x64_sys_ioctl+0x83/0xb0
  [ 1297.080232]        do_syscall_64+0x33/0x80
  [ 1297.080680]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 1297.081139]
		 other info that might help us debug this:

  [ 1297.082536]  Possible unsafe locking scenario:

  [ 1297.083510]        CPU0                    CPU1
  [ 1297.084005]        ----                    ----
  [ 1297.084500]   lock(&fs_info->qgroup_ioctl_lock);
  [ 1297.084994]                                lock(sb_internal#2);
  [ 1297.085485]                                lock(&fs_info->qgroup_ioctl_lock);
  [ 1297.085974]   lock(sb_internal#2);
  [ 1297.086454]
		  *** DEADLOCK ***
  [ 1297.087880] 3 locks held by btrfs/189080:
  [ 1297.088324]  #0: ffff9f2725731470 (sb_writers#14){.+.+}-{0:0}, at: btrfs_ioctl+0xa73/0x36f0 [btrfs]
  [ 1297.088799]  #1: ffff9f2702b60cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1f4d/0x36f0 [btrfs]
  [ 1297.089284]  #2: ffff9f2702b61a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0xa70 [btrfs]
  [ 1297.089771]
		 stack backtrace:
  [ 1297.090662] CPU: 5 PID: 189080 Comm: btrfs Not tainted 5.10.0-rc4-btrfs-next-73 #1
  [ 1297.091132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  [ 1297.092123] Call Trace:
  [ 1297.092629]  dump_stack+0x8d/0xb5
  [ 1297.093115]  check_noncircular+0xff/0x110
  [ 1297.093596]  check_prev_add+0x91/0xc60
  [ 1297.094076]  ? kvm_clock_read+0x14/0x30
  [ 1297.094553]  ? kvm_sched_clock_read+0x5/0x10
  [ 1297.095029]  __lock_acquire+0x1740/0x3110
  [ 1297.095510]  lock_acquire+0xd8/0x490
  [ 1297.095993]  ? btrfs_quota_enable+0xaf/0xa70 [btrfs]
  [ 1297.096476]  start_transaction+0x3c5/0x760 [btrfs]
  [ 1297.096962]  ? btrfs_quota_enable+0xaf/0xa70 [btrfs]
  [ 1297.097451]  btrfs_quota_enable+0xaf/0xa70 [btrfs]
  [ 1297.097941]  ? btrfs_ioctl+0x1f4d/0x36f0 [btrfs]
  [ 1297.098429]  btrfs_ioctl+0x2c60/0x36f0 [btrfs]
  [ 1297.098904]  ? do_user_addr_fault+0x20c/0x430
  [ 1297.099382]  ? kvm_clock_read+0x14/0x30
  [ 1297.099854]  ? kvm_sched_clock_read+0x5/0x10
  [ 1297.100328]  ? sched_clock+0x5/0x10
  [ 1297.100801]  ? sched_clock_cpu+0x12/0x180
  [ 1297.101272]  ? __x64_sys_ioctl+0x83/0xb0
  [ 1297.101739]  __x64_sys_ioctl+0x83/0xb0
  [ 1297.102207]  do_syscall_64+0x33/0x80
  [ 1297.102673]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 1297.103148] RIP: 0033:0x7f773ff65d87

This is because during the quota enable ioctl we lock first the mutex
qgroup_ioctl_lock and then start a transaction, and starting a transaction
acquires a fs freeze semaphore (at the VFS level). However, every other
code path, except for the quota disable ioctl path, we do the opposite:
we start a transaction and then lock the mutex.

So fix this by making the quota enable and disable paths to start the
transaction without having the mutex locked, and then, after starting the
transaction, lock the mutex and check if some other task already enabled
or disabled the quotas, bailing with success if that was the case.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>

 Conflicts:
	fs/btrfs/qgroup.c
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/btrfs/ctree.h  |    5 +++-
 fs/btrfs/qgroup.c |   56 ++++++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 52 insertions(+), 9 deletions(-)

--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -827,7 +827,10 @@ struct btrfs_fs_info {
 	 */
 	struct ulist *qgroup_ulist;
 
-	/* protect user change for quota operations */
+	/*
+	 * Protect user change for quota operations. If a transaction is needed,
+	 * it must be started before locking this lock.
+	 */
 	struct mutex qgroup_ioctl_lock;
 
 	/* list of dirty qgroups to be written at next commit */
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -886,6 +886,7 @@ int btrfs_quota_enable(struct btrfs_fs_i
 	struct btrfs_key found_key;
 	struct btrfs_qgroup *qgroup = NULL;
 	struct btrfs_trans_handle *trans = NULL;
+	struct ulist *ulist = NULL;
 	int ret = 0;
 	int slot;
 
@@ -893,13 +894,28 @@ int btrfs_quota_enable(struct btrfs_fs_i
 	if (fs_info->quota_root)
 		goto out;
 
-	fs_info->qgroup_ulist = ulist_alloc(GFP_KERNEL);
-	if (!fs_info->qgroup_ulist) {
+	ulist = ulist_alloc(GFP_KERNEL);
+	if (!ulist) {
 		ret = -ENOMEM;
 		goto out;
 	}
 
 	/*
+	 * Unlock qgroup_ioctl_lock before starting the transaction. This is to
+	 * avoid lock acquisition inversion problems (reported by lockdep) between
+	 * qgroup_ioctl_lock and the vfs freeze semaphores, acquired when we
+	 * start a transaction.
+	 * After we started the transaction lock qgroup_ioctl_lock again and
+	 * check if someone else created the quota root in the meanwhile. If so,
+	 * just return success and release the transaction handle.
+	 *
+	 * Also we don't need to worry about someone else calling
+	 * btrfs_sysfs_add_qgroups() after we unlock and getting an error because
+	 * that function returns 0 (success) when the sysfs entries already exist.
+	 */
+	mutex_unlock(&fs_info->qgroup_ioctl_lock);
+
+	/*
 	 * 1 for quota root item
 	 * 1 for BTRFS_QGROUP_STATUS item
 	 *
@@ -908,12 +924,20 @@ int btrfs_quota_enable(struct btrfs_fs_i
 	 * would be a lot of overkill.
 	 */
 	trans = btrfs_start_transaction(tree_root, 2);
+
+	mutex_lock(&fs_info->qgroup_ioctl_lock);
 	if (IS_ERR(trans)) {
 		ret = PTR_ERR(trans);
 		trans = NULL;
 		goto out;
 	}
 
+	if (fs_info->quota_root)
+		goto out;
+
+	fs_info->qgroup_ulist = ulist;
+	ulist = NULL;
+
 	/*
 	 * initially create the quota tree
 	 */
@@ -1046,10 +1070,13 @@ out:
 	if (ret) {
 		ulist_free(fs_info->qgroup_ulist);
 		fs_info->qgroup_ulist = NULL;
-		if (trans)
-			btrfs_end_transaction(trans);
 	}
 	mutex_unlock(&fs_info->qgroup_ioctl_lock);
+	if (ret && trans)
+		btrfs_end_transaction(trans);
+	else if (trans)
+		ret = btrfs_end_transaction(trans);
+	ulist_free(ulist);
 	return ret;
 }
 
@@ -1062,19 +1089,29 @@ int btrfs_quota_disable(struct btrfs_fs_
 	mutex_lock(&fs_info->qgroup_ioctl_lock);
 	if (!fs_info->quota_root)
 		goto out;
+	mutex_unlock(&fs_info->qgroup_ioctl_lock);
 
 	/*
 	 * 1 For the root item
 	 *
 	 * We should also reserve enough items for the quota tree deletion in
 	 * btrfs_clean_quota_tree but this is not done.
+	 *
+	 * Also, we must always start a transaction without holding the mutex
+	 * qgroup_ioctl_lock, see btrfs_quota_enable().
 	 */
 	trans = btrfs_start_transaction(fs_info->tree_root, 1);
+
+	mutex_lock(&fs_info->qgroup_ioctl_lock);
 	if (IS_ERR(trans)) {
 		ret = PTR_ERR(trans);
+		trans = NULL;
 		goto out;
 	}
 
+	if (!fs_info->quota_root)
+		goto out;
+
 	clear_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags);
 	btrfs_qgroup_wait_for_completion(fs_info, false);
 	spin_lock(&fs_info->qgroup_lock);
@@ -1088,13 +1125,13 @@ int btrfs_quota_disable(struct btrfs_fs_
 	ret = btrfs_clean_quota_tree(trans, quota_root);
 	if (ret) {
 		btrfs_abort_transaction(trans, ret);
-		goto end_trans;
+		goto out;
 	}
 
 	ret = btrfs_del_root(trans, &quota_root->root_key);
 	if (ret) {
 		btrfs_abort_transaction(trans, ret);
-		goto end_trans;
+		goto out;
 	}
 
 	list_del(&quota_root->dirty_list);
@@ -1108,10 +1145,13 @@ int btrfs_quota_disable(struct btrfs_fs_
 	free_extent_buffer(quota_root->commit_root);
 	kfree(quota_root);
 
-end_trans:
-	ret = btrfs_end_transaction(trans);
 out:
 	mutex_unlock(&fs_info->qgroup_ioctl_lock);
+	if (ret && trans)
+		btrfs_end_transaction(trans);
+	else if (trans)
+		ret = btrfs_end_transaction(trans);
+
 	return ret;
 }
 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 24/27] net: xilinx_emaclite: Do not print real IOMEM pointer
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 23/27] btrfs: fix lockdep splat when enabling and disabling qgroups Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 25/27] btrfs: qgroup: dont commit transaction when we already hold the handle Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, YueHaibing, David S. Miller,
	Pavel Machek (CIP)

From: YueHaibing <yuehaibing@huawei.com>

commit d0d62baa7f505bd4c59cd169692ff07ec49dde37 upstream.

Printing kernel pointers is discouraged because they might leak kernel
memory layout.  This fixes smatch warning:

drivers/net/ethernet/xilinx/xilinx_emaclite.c:1191 xemaclite_of_probe() warn:
 argument 4 to %08lX specifier is cast from pointer

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pavel Machek (CIP) <pavel@denx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/xilinx/xilinx_emaclite.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

--- a/drivers/net/ethernet/xilinx/xilinx_emaclite.c
+++ b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
@@ -1191,9 +1191,8 @@ static int xemaclite_of_probe(struct pla
 	}
 
 	dev_info(dev,
-		 "Xilinx EmacLite at 0x%08X mapped to 0x%08X, irq=%d\n",
-		 (unsigned int __force)ndev->mem_start,
-		 (unsigned int __force)lp->base_addr, ndev->irq);
+		 "Xilinx EmacLite at 0x%08X mapped to 0x%p, irq=%d\n",
+		 (unsigned int __force)ndev->mem_start, lp->base_addr, ndev->irq);
 	return 0;
 
 error:



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 25/27] btrfs: qgroup: dont commit transaction when we already hold the handle
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 24/27] net: xilinx_emaclite: Do not print real IOMEM pointer Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 26/27] btrfs: export and rename qgroup_reserve_meta Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Filipe Manana, Qu Wenruo,
	David Sterba, Anand Jain

From: Qu Wenruo <wqu@suse.com>

commit 6f23277a49e68f8a9355385c846939ad0b1261e7 upstream

[BUG]
When running the following script, btrfs will trigger an ASSERT():

  #/bin/bash
  mkfs.btrfs -f $dev
  mount $dev $mnt
  xfs_io -f -c "pwrite 0 1G" $mnt/file
  sync
  btrfs quota enable $mnt
  btrfs quota rescan -w $mnt

  # Manually set the limit below current usage
  btrfs qgroup limit 512M $mnt $mnt

  # Crash happens
  touch $mnt/file

The dmesg looks like this:

  assertion failed: refcount_read(&trans->use_count) == 1, in fs/btrfs/transaction.c:2022
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.h:3230!
  invalid opcode: 0000 [#1] SMP PTI
  RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
   btrfs_commit_transaction.cold+0x11/0x5d [btrfs]
   try_flush_qgroup+0x67/0x100 [btrfs]
   __btrfs_qgroup_reserve_meta+0x3a/0x60 [btrfs]
   btrfs_delayed_update_inode+0xaa/0x350 [btrfs]
   btrfs_update_inode+0x9d/0x110 [btrfs]
   btrfs_dirty_inode+0x5d/0xd0 [btrfs]
   touch_atime+0xb5/0x100
   iterate_dir+0xf1/0x1b0
   __x64_sys_getdents64+0x78/0x110
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7fb5afe588db

[CAUSE]
In try_flush_qgroup(), we assume we don't hold a transaction handle at
all.  This is true for data reservation and mostly true for metadata.
Since data space reservation always happens before we start a
transaction, and for most metadata operation we reserve space in
start_transaction().

But there is an exception, btrfs_delayed_inode_reserve_metadata().
It holds a transaction handle, while still trying to reserve extra
metadata space.

When we hit EDQUOT inside btrfs_delayed_inode_reserve_metadata(), we
will join current transaction and commit, while we still have
transaction handle from qgroup code.

[FIX]
Let's check current->journal before we join the transaction.

If current->journal is unset or BTRFS_SEND_TRANS_STUB, it means
we are not holding a transaction, thus are able to join and then commit
transaction.

If current->journal is a valid transaction handle, we avoid committing
transaction and just end it

This is less effective than committing current transaction, as it won't
free metadata reserved space, but we may still free some data space
before new data writes.

Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1178634
Fixes: c53e9653605d ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/btrfs/qgroup.c |   20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3502,6 +3502,7 @@ static int try_flush_qgroup(struct btrfs
 {
 	struct btrfs_trans_handle *trans;
 	int ret;
+	bool can_commit = true;
 
 	/*
 	 * We don't want to run flush again and again, so if there is a running
@@ -3513,6 +3514,20 @@ static int try_flush_qgroup(struct btrfs
 		return 0;
 	}
 
+	/*
+	 * If current process holds a transaction, we shouldn't flush, as we
+	 * assume all space reservation happens before a transaction handle is
+	 * held.
+	 *
+	 * But there are cases like btrfs_delayed_item_reserve_metadata() where
+	 * we try to reserve space with one transction handle already held.
+	 * In that case we can't commit transaction, but at least try to end it
+	 * and hope the started data writes can free some space.
+	 */
+	if (current->journal_info &&
+	    current->journal_info != BTRFS_SEND_TRANS_STUB)
+		can_commit = false;
+
 	ret = btrfs_start_delalloc_snapshot(root);
 	if (ret < 0)
 		goto out;
@@ -3524,7 +3539,10 @@ static int try_flush_qgroup(struct btrfs
 		goto out;
 	}
 
-	ret = btrfs_commit_transaction(trans);
+	if (can_commit)
+		ret = btrfs_commit_transaction(trans);
+	else
+		ret = btrfs_end_transaction(trans);
 out:
 	clear_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state);
 	wake_up(&root->qgroup_flush_wait);



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 26/27] btrfs: export and rename qgroup_reserve_meta
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (24 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 25/27] btrfs: qgroup: dont commit transaction when we already hold the handle Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 15:07 ` [PATCH 5.4 27/27] btrfs: dont flush from btrfs_delayed_inode_reserve_metadata Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Nikolay Borisov, David Sterba, Anand Jain

From: Nikolay Borisov <nborisov@suse.com>

commit 80e9baed722c853056e0c5374f51524593cb1031 upstream

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/btrfs/qgroup.c |    8 ++++----
 fs/btrfs/qgroup.h |    3 ++-
 2 files changed, 6 insertions(+), 5 deletions(-)

--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3800,8 +3800,8 @@ static int sub_root_meta_rsv(struct btrf
 	return num_bytes;
 }
 
-static int qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
-				enum btrfs_qgroup_rsv_type type, bool enforce)
+int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
+			      enum btrfs_qgroup_rsv_type type, bool enforce)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	int ret;
@@ -3832,14 +3832,14 @@ int __btrfs_qgroup_reserve_meta(struct b
 {
 	int ret;
 
-	ret = qgroup_reserve_meta(root, num_bytes, type, enforce);
+	ret = btrfs_qgroup_reserve_meta(root, num_bytes, type, enforce);
 	if (ret <= 0 && ret != -EDQUOT)
 		return ret;
 
 	ret = try_flush_qgroup(root);
 	if (ret < 0)
 		return ret;
-	return qgroup_reserve_meta(root, num_bytes, type, enforce);
+	return btrfs_qgroup_reserve_meta(root, num_bytes, type, enforce);
 }
 
 void btrfs_qgroup_free_meta_all_pertrans(struct btrfs_root *root)
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -349,7 +349,8 @@ int btrfs_qgroup_reserve_data(struct btr
 int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len);
 int btrfs_qgroup_free_data(struct inode *inode,
 			struct extent_changeset *reserved, u64 start, u64 len);
-
+int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
+			      enum btrfs_qgroup_rsv_type type, bool enforce);
 int __btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
 				enum btrfs_qgroup_rsv_type type, bool enforce);
 /* Reserve metadata space for pertrans and prealloc type */



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5.4 27/27] btrfs: dont flush from btrfs_delayed_inode_reserve_metadata
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (25 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 26/27] btrfs: export and rename qgroup_reserve_meta Greg Kroah-Hartman
@ 2021-08-13 15:07 ` Greg Kroah-Hartman
  2021-08-13 23:24 ` [PATCH 5.4 00/27] 5.4.141-rc1 review Shuah Khan
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-13 15:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Qu Wenruo, Nikolay Borisov,
	David Sterba, Anand Jain

From: Nikolay Borisov <nborisov@suse.com>

commit 4d14c5cde5c268a2bc26addecf09489cb953ef64 upstream

Calling btrfs_qgroup_reserve_meta_prealloc from
btrfs_delayed_inode_reserve_metadata can result in flushing delalloc
while holding a transaction and delayed node locks. This is deadlock
prone. In the past multiple commits:

 * ae5e070eaca9 ("btrfs: qgroup: don't try to wait flushing if we're
already holding a transaction")

 * 6f23277a49e6 ("btrfs: qgroup: don't commit transaction when we already
 hold the handle")

Tried to solve various aspects of this but this was always a
whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock
scenario involving btrfs_delayed_node::mutex. Namely, one thread
can call btrfs_dirty_inode as a result of reading a file and modifying
its atime:

  PID: 6963   TASK: ffff8c7f3f94c000  CPU: 2   COMMAND: "test"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_timeout at ffffffffa52a1bdd
  #3  wait_for_completion at ffffffffa529eeea             <-- sleeps with delayed node mutex held
  #4  start_delalloc_inodes at ffffffffc0380db5
  #5  btrfs_start_delalloc_snapshot at ffffffffc0393836
  #6  try_flush_qgroup at ffffffffc03f04b2
  #7  __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6     <-- tries to reserve space and starts delalloc inodes.
  #8  btrfs_delayed_update_inode at ffffffffc03e31aa      <-- acquires delayed node mutex
  #9  btrfs_update_inode at ffffffffc0385ba8
 #10  btrfs_dirty_inode at ffffffffc038627b               <-- TRANSACTIION OPENED
 #11  touch_atime at ffffffffa4cf0000
 #12  generic_file_read_iter at ffffffffa4c1f123
 #13  new_sync_read at ffffffffa4ccdc8a
 #14  vfs_read at ffffffffa4cd0849
 #15  ksys_read at ffffffffa4cd0bd1
 #16  do_syscall_64 at ffffffffa4a052eb
 #17  entry_SYSCALL_64_after_hwframe at ffffffffa540008c

This will cause an asynchronous work to flush the delalloc inodes to
happen which can try to acquire the same delayed_node mutex:

  PID: 455    TASK: ffff8c8085fa4000  CPU: 5   COMMAND: "kworker/u16:30"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_preempt_disabled at ffffffffa529e80a
  #3  __mutex_lock at ffffffffa529fdcb                    <-- goes to sleep, never wakes up.
  #4  btrfs_delayed_update_inode at ffffffffc03e3143      <-- tries to acquire the mutex
  #5  btrfs_update_inode at ffffffffc0385ba8              <-- this is the same inode that pid 6963 is holding
  #6  cow_file_range_inline.constprop.78 at ffffffffc0386be7
  #7  cow_file_range at ffffffffc03879c1
  #8  btrfs_run_delalloc_range at ffffffffc038894c
  #9  writepage_delalloc at ffffffffc03a3c8f
 #10  __extent_writepage at ffffffffc03a4c01
 #11  extent_write_cache_pages at ffffffffc03a500b
 #12  extent_writepages at ffffffffc03a6de2
 #13  do_writepages at ffffffffa4c277eb
 #14  __filemap_fdatawrite_range at ffffffffa4c1e5bb
 #15  btrfs_run_delalloc_work at ffffffffc0380987         <-- starts running delayed nodes
 #16  normal_work_helper at ffffffffc03b706c
 #17  process_one_work at ffffffffa4aba4e4
 #18  worker_thread at ffffffffa4aba6fd
 #19  kthread at ffffffffa4ac0a3d
 #20  ret_from_fork at ffffffffa54001ff

To fully address those cases the complete fix is to never issue any
flushing while holding the transaction or the delayed node lock. This
patch achieves it by calling qgroup_reserve_meta directly which will
either succeed without flushing or will fail and return -EDQUOT. In the
latter case that return value is going to be propagated to
btrfs_dirty_inode which will fallback to start a new transaction. That's
fine as the majority of time we expect the inode will have
BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly
copying the in-memory state.

Fixes: c53e9653605d ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/btrfs/delayed-inode.c |    3 ++-
 fs/btrfs/inode.c         |    2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -627,7 +627,8 @@ static int btrfs_delayed_inode_reserve_m
 	 */
 	if (!src_rsv || (!trans->bytes_reserved &&
 			 src_rsv->type != BTRFS_BLOCK_RSV_DELALLOC)) {
-		ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
+		ret = btrfs_qgroup_reserve_meta(root, num_bytes,
+					  BTRFS_QGROUP_RSV_META_PREALLOC, true);
 		if (ret < 0)
 			return ret;
 		ret = btrfs_block_rsv_add(root, dst_rsv, num_bytes,
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6375,7 +6375,7 @@ static int btrfs_dirty_inode(struct inod
 		return PTR_ERR(trans);
 
 	ret = btrfs_update_inode(trans, root, inode);
-	if (ret && ret == -ENOSPC) {
+	if (ret && (ret == -ENOSPC || ret == -EDQUOT)) {
 		/* whoops, lets try again with the full transaction */
 		btrfs_end_transaction(trans);
 		trans = btrfs_start_transaction(root, 1);



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5.4 00/27] 5.4.141-rc1 review
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (26 preceding siblings ...)
  2021-08-13 15:07 ` [PATCH 5.4 27/27] btrfs: dont flush from btrfs_delayed_inode_reserve_metadata Greg Kroah-Hartman
@ 2021-08-13 23:24 ` Shuah Khan
  2021-08-14 11:11 ` Sudip Mukherjee
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Shuah Khan @ 2021-08-13 23:24 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, shuah, patches, lkft-triage, pavel,
	jonathanh, f.fainelli, stable, Shuah Khan

On 8/13/21 9:06 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.4.141 release.
> There are 27 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun, 15 Aug 2021 15:05:12 +0000.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.141-rc1.gz
> or in the git tree and branch at:
> 	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 
>

Compiled and booted on my test system. No dmesg regressions.

Tested-by: Shuah Khan <skhan@linuxfoundation.org>

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5.4 00/27] 5.4.141-rc1 review
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (27 preceding siblings ...)
  2021-08-13 23:24 ` [PATCH 5.4 00/27] 5.4.141-rc1 review Shuah Khan
@ 2021-08-14 11:11 ` Sudip Mukherjee
  2021-08-14 11:39 ` Naresh Kamboju
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Sudip Mukherjee @ 2021-08-14 11:11 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, linux, shuah, patches, lkft-triage,
	pavel, jonathanh, f.fainelli, stable

Hi Greg,

On Fri, Aug 13, 2021 at 05:06:58PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.4.141 release.
> There are 27 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun, 15 Aug 2021 15:05:12 +0000.
> Anything received after that time might be too late.

Build test:
mips (gcc version 11.1.1 20210723): 65 configs -> no failure
arm (gcc version 11.1.1 20210723): 107 configs -> no new failure
arm64 (gcc version 11.1.1 20210723): 2 configs -> no failure
x86_64 (gcc version 10.2.1 20210110): 4 configs -> no failure

Boot test:
x86_64: Booted on my test laptop. No regression.
x86_64: Booted on qemu. No regression. [1]

[1]. https://openqa.qa.codethink.co.uk/tests/28


Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>

--
Regards
Sudip

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5.4 00/27] 5.4.141-rc1 review
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (28 preceding siblings ...)
  2021-08-14 11:11 ` Sudip Mukherjee
@ 2021-08-14 11:39 ` Naresh Kamboju
  2021-08-14 18:15 ` Guenter Roeck
  2021-08-16  3:02 ` Samuel Zou
  31 siblings, 0 replies; 33+ messages in thread
From: Naresh Kamboju @ 2021-08-14 11:39 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: open list, Shuah Khan, Florian Fainelli, patches, lkft-triage,
	Jon Hunter, linux-stable, Pavel Machek, Andrew Morton,
	Linus Torvalds, Guenter Roeck

On Fri, 13 Aug 2021 at 20:44, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> This is the start of the stable review cycle for the 5.4.141 release.
> There are 27 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun, 15 Aug 2021 15:05:12 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
>         https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.141-rc1.gz
> or in the git tree and branch at:
>         git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h


Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>

## Build
* kernel: 5.4.141-rc1
* git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
* git branch: linux-5.4.y
* git commit: df6ce9b59c705e13e7fdf19dbba5a715b993b712
* git describe: v5.4.139-114-gdf6ce9b59c70
* test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.4.y/build/v5.4.139-114-gdf6ce9b59c70

## No regressions (compared to v5.4.139-86-gff7bc8590c20)

## No fixes (compared to v5.4.139-86-gff7bc8590c20)

## Test result summary
total: 77974, pass: 64106, fail: 387, skip: 12161, xfail: 1320

## Build Summary
* arc: 10 total, 10 passed, 0 failed
* arm: 193 total, 193 passed, 0 failed
* arm64: 27 total, 27 passed, 0 failed
* i386: 15 total, 15 passed, 0 failed
* mips: 45 total, 45 passed, 0 failed
* parisc: 9 total, 9 passed, 0 failed
* powerpc: 27 total, 27 passed, 0 failed
* riscv: 21 total, 21 passed, 0 failed
* s390: 9 total, 9 passed, 0 failed
* sh: 18 total, 18 passed, 0 failed
* sparc: 9 total, 9 passed, 0 failed
* x86_64: 27 total, 27 passed, 0 failed

## Test suites summary
* fwts
* igt-gpu-tools
* kselftest-android
* kselftest-arm64
* kselftest-breakpoints
* kselftest-capabilities
* kselftest-cgroup
* kselftest-clone3
* kselftest-core
* kselftest-cpu-hotplug
* kselftest-cpufreq
* kselftest-drivers
* kselftest-efivarfs
* kselftest-filesystems
* kselftest-firmware
* kselftest-fpu
* kselftest-futex
* kselftest-gpio
* kselftest-intel_pstate
* kselftest-ipc
* kselftest-ir
* kselftest-kcmp
* kselftest-kvm
* kselftest-lib
* kselftest-livepatch
* kselftest-lkdtm
* kselftest-membarrier
* kselftest-memfd
* kselftest-memory-hotplug
* kselftest-mincore
* kselftest-mount
* kselftest-mqueue
* kselftest-net
* kselftest-netfilter
* kselftest-nsfs
* kselftest-openat2
* kselftest-pid_namespace
* kselftest-pidfd
* kselftest-proc
* kselftest-pstore
* kselftest-ptrace
* kselftest-rseq
* kselftest-rtc
* kselftest-timens
* kselftest-timers
* kselftest-tmpfs
* kselftest-tpm2
* kselftest-user
* kselftest-vm
* kselftest-x86
* kselftest-zram
* kvm-unit-tests
* libhugetlbfs
* linux-log-parser
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-controllers-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-open-posix-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-tracing-tests
* network-basic-tests
* packetdrill
* perf
* rcutorture
* ssuite
* v4l2-compliance

--
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5.4 00/27] 5.4.141-rc1 review
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (29 preceding siblings ...)
  2021-08-14 11:39 ` Naresh Kamboju
@ 2021-08-14 18:15 ` Guenter Roeck
  2021-08-16  3:02 ` Samuel Zou
  31 siblings, 0 replies; 33+ messages in thread
From: Guenter Roeck @ 2021-08-14 18:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuah, patches, lkft-triage, pavel,
	jonathanh, f.fainelli, stable

On Fri, Aug 13, 2021 at 05:06:58PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.4.141 release.
> There are 27 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun, 15 Aug 2021 15:05:12 +0000.
> Anything received after that time might be too late.
> 

Build results:
	total: 157 pass: 157 fail: 0
Qemu test results:
	total: 444 pass: 444 fail: 0

Tested-by: Guenter Roeck <linux@roeck-us.net>

Guenter

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5.4 00/27] 5.4.141-rc1 review
  2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
                   ` (30 preceding siblings ...)
  2021-08-14 18:15 ` Guenter Roeck
@ 2021-08-16  3:02 ` Samuel Zou
  31 siblings, 0 replies; 33+ messages in thread
From: Samuel Zou @ 2021-08-16  3:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, shuah, patches, lkft-triage, pavel,
	jonathanh, f.fainelli, stable



On 2021/8/13 23:06, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.4.141 release.
> There are 27 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun, 15 Aug 2021 15:05:12 +0000.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.141-rc1.gz
> or in the git tree and branch at:
> 	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Tested on arm64 and x86 for 5.4.141-rc1,

Kernel repo:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Branch: linux-5.4.y
Version: 5.4.141-rc1
Commit: df6ce9b59c705e13e7fdf19dbba5a715b993b712
Compiler: gcc version 7.3.0 (GCC)

arm64:
--------------------------------------------------------------------
Testcase Result Summary:
total: 8906
passed: 8906
failed: 0
timeout: 0
--------------------------------------------------------------------

x86:
--------------------------------------------------------------------
Testcase Result Summary:
total: 8906
passed: 8906
failed: 0
timeout: 0
--------------------------------------------------------------------

Tested-by: Hulk Robot <hulkrobot@huawei.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2021-08-16  3:02 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
2021-08-13 15:06 ` [PATCH 5.4 01/27] KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 02/27] tee: Correct inappropriate usage of TEE_SHM_DMA_BUF flag Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 03/27] media: v4l2-mem2mem: always consider OUTPUT queue during poll Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 04/27] tracing: Reject string operand in the histogram expression Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 05/27] usb: dwc3: Stop active transfers before halting the controller Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 06/27] usb: dwc3: gadget: Allow runtime suspend if UDC unbinded Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 07/27] usb: dwc3: gadget: Restart DWC3 gadget when enabling pullup Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 08/27] usb: dwc3: gadget: Prevent EP queuing while stopping transfers Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 09/27] usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 10/27] usb: dwc3: gadget: Disable gadget IRQ during pullup disable Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 11/27] usb: dwc3: gadget: Avoid runtime resume if disabling pullup Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 12/27] KVM: X86: MMU: Use the correct inherited permissions to get shadow page Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 13/27] USB:ehci:fix Kunpeng920 ehci hardware problem Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 14/27] ALSA: hda: Add quirk for ASUS Flow x13 Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 15/27] ppp: Fix generating ppp unit id when ifname is not specified Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 16/27] ovl: prevent private clone if bind mount is not allowed Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 17/27] btrfs: make qgroup_free_reserved_data take btrfs_inode Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 18/27] btrfs: make btrfs_qgroup_reserve_data " Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 19/27] btrfs: qgroup: allow to unreserve range without releasing other ranges Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 20/27] btrfs: qgroup: try to flush qgroup space when we get -EDQUOT Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 21/27] btrfs: transaction: Cleanup unused TRANS_STATE_BLOCKED Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 22/27] btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve retry-after-EDQUOT Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 23/27] btrfs: fix lockdep splat when enabling and disabling qgroups Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 24/27] net: xilinx_emaclite: Do not print real IOMEM pointer Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 25/27] btrfs: qgroup: dont commit transaction when we already hold the handle Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 26/27] btrfs: export and rename qgroup_reserve_meta Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 27/27] btrfs: dont flush from btrfs_delayed_inode_reserve_metadata Greg Kroah-Hartman
2021-08-13 23:24 ` [PATCH 5.4 00/27] 5.4.141-rc1 review Shuah Khan
2021-08-14 11:11 ` Sudip Mukherjee
2021-08-14 11:39 ` Naresh Kamboju
2021-08-14 18:15 ` Guenter Roeck
2021-08-16  3:02 ` Samuel Zou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).