LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Doug Oucharek <doucharek@cray.com>
To: "Dilger, Andreas" <andreas.dilger@intel.com>
Cc: James Simmons <jsimmons@infradead.org>,
	"Drokin, Oleg" <oleg.drokin@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Lustre Development List" <lustre-devel@lists.lustre.org>
Subject: Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h
Date: Tue, 17 Apr 2018 15:41:44 +0000	[thread overview]
Message-ID: <399C67D9-B90A-40D5-97DB-74EA257D650E@cray.com> (raw)
In-Reply-To: <A380CFB4-E33C-4D94-AA83-628A2A9F52C5@intel.com>

[-- Attachment #1: Type: text/plain, Size: 2846 bytes --]


On Apr 16, 2018, at 10:26 PM, Dilger, Andreas <andreas.dilger@intel.com<mailto:andreas.dilger@intel.com>> wrote:

On Apr 16, 2018, at 16:48, Doug Oucharek <doucharek@cray.com<mailto:doucharek@cray.com>> wrote:


On Apr 16, 2018, at 3:42 PM, James Simmons <jsimmons@infradead.org<mailto:jsimmons@infradead.org>> wrote:


James,

If I understand correctly, you're saying you want to be able to build without debug support...?  I'm not convinced that building a client without debug support is interesting or useful.  In fact, I think it would be harmful, and we shouldn't open up the possibility - this is switchable debug with very low overhead when not actually "on".  It would be really awful to get a problem on a running system and discover there's no debug support - that you can't even enable debug without a reinstall.

If I've understood you correctly, then I would want to see proof of a significant performance cost when debug is built but *off* before agreeing to even exposing this option.  (I know it's a choice they'd have to make, but if it's not really useful with a side order of potentially harmful, we shouldn't even give people the choice.)

I'm not saying add the option today but this is more for the long game.
While the Intel lustre developers deeply love lustre's debugging
infrastructure I see a future where something better will come along to
replace it. When that day comes we will have a period where both
debugging infrastructurs will exist and some deployers of lustre will
want to turn off the old debugging infrastructure and just use the new.
That is what I have in mind. A switch to flip between options.

Yes please!!  An option for users which says “no, you do not have the right to panic my system via LASSERT whenever you like” would be a blessing.

Note that LASSERT() itself does not panic the system, unless you configure it
with panic_on_lbug=1.  Otherwise, it just blocks that thread (though this can
also have an impact on other threads if you are holding locks at that time).

That said, the LASSERT() should not be hit unless there is bad code, data
corruption, or the LASSERT() itself is incorrect (essentially bad code also).

I don’t know about the rest of Lustre, but LNet and LNDs use LASSERT() at the top of every function call to check parameters, reference counters, state variables, etc.  These do get hit in the field and do take down the nodes.  I have yet to see a case where the customer’s data was at risk of corruption.  These should all be converted to regular “if” checks returning EINVAL.


So "whenever you like" is "whenever the system is about to corrupt your data",
and people are not very forgiving if a filesystem corrupts their data...

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation


[-- Attachment #2: Type: text/html, Size: 13924 bytes --]

  reply	other threads:[~2018-04-17 15:41 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-16  0:42 [PATCH 0/6] staging: lustre: code rearrangement NeilBrown
2018-04-16  0:42 ` [PATCH 3/6] staging: lustre: remove include/linux/libcfs/linux/linux-cpu.h NeilBrown
2018-04-16  3:52   ` James Simmons
2018-04-18  2:33     ` NeilBrown
2018-04-23 13:13     ` Greg Kroah-Hartman
2018-04-16  0:42 ` [PATCH 4/6] staging: lustre: rearrange placement of CPU partition management code NeilBrown
2018-04-16  3:53   ` James Simmons
2018-04-16  0:42 ` [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h NeilBrown
2018-04-16  3:48   ` James Simmons
2018-04-16 15:27     ` [lustre-devel] " Patrick Farrell
2018-04-16 22:42       ` James Simmons
2018-04-16 22:48         ` Doug Oucharek
2018-04-17  5:26           ` Dilger, Andreas
2018-04-17 15:41             ` Doug Oucharek [this message]
2018-04-18  2:29         ` NeilBrown
2018-04-18  4:23           ` Patrick Farrell
2018-04-18  2:17     ` NeilBrown
2018-04-23 13:03       ` Greg Kroah-Hartman
2018-04-16  0:42 ` [PATCH 6/6] staging: lustre: move remaining code from linux-module.c to module.c NeilBrown
2018-04-16  0:42 ` [PATCH 5/6] staging: lustre: move misc-device registration closer to related code NeilBrown
2018-04-23 13:12   ` Greg Kroah-Hartman
2018-04-16  0:42 ` [PATCH 2/6] staging: lustre: remove libcfs/linux/libcfs.h NeilBrown
2018-04-16  3:35   ` James Simmons
2018-04-18  2:32     ` NeilBrown
2018-04-23 13:03       ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=399C67D9-B90A-40D5-97DB-74EA257D650E@cray.com \
    --to=doucharek@cray.com \
    --cc=andreas.dilger@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jsimmons@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lustre-devel@lists.lustre.org \
    --cc=oleg.drokin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).