LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: fstests@vger.kernel.org
Cc: hare@suse.de, dgilbert@interlog.com, jeyu@kernel.org,
	lucas.demarchi@intel.com, linux-kernel@vger.kernel.org,
	Luis Chamberlain <mcgrof@kernel.org>
Subject: [PATCH 3/4] common/module: add a patient module rmmod
Date: Tue, 27 Jul 2021 13:10:44 -0700	[thread overview]
Message-ID: <20210727201045.2540681-4-mcgrof@kernel.org> (raw)
In-Reply-To: <20210727201045.2540681-1-mcgrof@kernel.org>

When we call rmmod it will fail if the refcnt is greater than 0.
This is expected, however, if using test modules such as scsi_debug,
userspace tests may expect that once userspace is done issuing out
whatever it needs to that immediately after it can remove the module.

This is not true, things can linger for a while when a test driver's
reference count is greater than 0, and so userspace has to wait before
it can reliably try rmmod.

Furthermore... experience on older kernels shows it may be a while
before we can actually remove the module even *after*
/sys/module/$module/refcnt is 0. This is only currently observed
on older kernels.

Provide a check for the refcnt before we try to remove the module.
And add a grace period to allow things quiesce before we really
try removing the module so that we don't get false positives.

An example of this is when using the scsi_debug module. Using
udevadm settle does ensure the devices are properly ready to
use after modprobe, however once you start using the module,
say testing xfs with generic/108, you will see refcnt linger
between 1 and 2, and once the test completes this eventually
goes down to 0. The refcnt can be > 0 after userspace thinks
it is done with a test, and so will fail to remove it. Likewise,
on older kernels even with a refcnt of 0 the module may sometimes
fail to be removed.

Yes, the sleep 10 is rather large, however if you want to prevent
false positives it is needed. Doug's patch recent test patch [0]
helps with these older kernel kernels not having to require such
large delays, however we have no way of telling currently if such
scsi_debug quiesce patch is applied.

[0] https://lore.kernel.org/linux-scsi/20210508230745.27923-1-dgilbert@interlog.com/

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 common/module | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/common/module b/common/module
index 39e4e793..064bc45e 100644
--- a/common/module
+++ b/common/module
@@ -81,3 +81,51 @@ _get_fs_module_param()
 {
 	cat /sys/module/${FSTYP}/parameters/${1} 2>/dev/null
 }
+
+# checks the refcount and returns 0 if we can safely remove the module. rmmod
+# does this check for us, but we can use this to also iterate checking for this
+# refcount before we even try to remove the module. This is useful when using
+# debug test modules which take a while to quiesce.
+_patient_rmmod_check_refcnt()
+{
+	local module=$1
+	local refcnt=0
+
+	if [[ -f /sys/module/$module/refcnt ]]; then
+		refcnt=$(cat /sys/module/$module/refcnt 2>/dev/null)
+		if [[ $? -ne 0 || $refcnt -eq 0 ]]; then
+			return 0
+		fi
+		return 1
+	fi
+	return 0
+}
+
+# patiently tries to wait to remove a module by ensuring first
+# the refcnt is 0. We wait for 10 seconds at most.
+_patient_rmmod()
+{
+	local module=$1
+	local max_tries=10
+
+	while [[ $max_tries != 0 ]]; do
+		_patient_rmmod_check_refcnt $module
+		if [[ $? -eq 0 ]]; then
+			break
+		fi
+		sleep 1
+		let max_tries=$max_tries-1
+	done
+
+	# give some grace time for when the refcnt is 0 as otherwise the
+	# removal can fail on older kernels. Refer to:
+	# https://bugzilla.kernel.org/show_bug.cgi?id=212337
+	sleep 10
+
+	if [[ -d /sys/module/$module ]]; then
+		modprobe -r $module
+		return $?
+	fi
+
+	return 0
+}
-- 
2.29.2


  parent reply	other threads:[~2021-07-27 20:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27 20:10 [PATCH 0/4] scsi_debug: improve failure rates Luis Chamberlain
2021-07-27 20:10 ` [PATCH 1/4] common/config: disable udevadm settle if CONFIG_NET is disabled Luis Chamberlain
2021-07-27 20:10 ` [PATCH 2/4] common/scsi_debug: use udevadm settle instead of sleeping Luis Chamberlain
2021-07-27 20:10 ` Luis Chamberlain [this message]
2021-07-30  0:22   ` [PATCH 3/4] common/module: add a patient module rmmod Luis Chamberlain
2021-08-10 21:31   ` Luis Chamberlain
2021-07-27 20:10 ` [PATCH 4/4] common/scsi_debug: use the patient module remover Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210727201045.2540681-4-mcgrof@kernel.org \
    --to=mcgrof@kernel.org \
    --cc=dgilbert@interlog.com \
    --cc=fstests@vger.kernel.org \
    --cc=hare@suse.de \
    --cc=jeyu@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucas.demarchi@intel.com \
    --subject='Re: [PATCH 3/4] common/module: add a patient module rmmod' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).