LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Fengguang Wu <fengguang.wu@intel.com>
To: Kevin Hilman <khilman@baylibre.com>
Cc: Mark Brown <broonie@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: kernel CI: printk loglevels in kernel boot logs?
Date: Wed, 22 Nov 2017 11:27:02 +0800	[thread overview]
Message-ID: <20171122032702.zd5ouuugrxyemqbh@wfg-t540p.sh.intel.com> (raw)
In-Reply-To: <20171122015610.x3kgzqgtwywlurmz@wfg-t540p.sh.intel.com>

[CC LKML for possible printk improvements]

On Wed, Nov 22, 2017 at 09:56:10AM +0800, Fengguang Wu wrote:
>Hi Kevin,
>
>On Tue, Nov 21, 2017 at 12:27:48PM -0800, Kevin Hilman wrote:
>>Hi Fenguang,
>>
>>In automated testing, for ease of parsing kernel boot logs (especially
>>separating warnings and errors from debug, info etc.)
>>
>>Right now we can get this info from a "dmesg --raw" after bootup, but
>>it would be really nice in certain automation frameworks to have a
>>kernel command-line option to enable printing of loglevels in default
>>boot log.
>
>Agreed.
>
>>This is espeically useful when ingesting kernel logs into advanced
>>search/analytics frameworks (I'm playing with and ELK stack: Elastic
>>Search, Logstash, Kibana).
>>
>>Does that sound like a feature you'd be interested in?
>
>Yes, sure.
>
>>In kernelCI, we're considering submitting a patch to add a
>>"show_loglevel" command-line argument to enable that option on kernel
>>boot.
>
>Thanks for doing that patch! It'll obviously make it easier to catch
>various warnings, which will be useful when used with caution,
>especially when false warnings (wrt. real problems that should be
>fixed) can be effectively filtered out.

[...]

>As you may know I'm currently reporting kernel oops in mainline
>kernel, hoping to clear noisy oops there -- they obviously hurt
>catching and bisecting new oops.
>
>I have to say the warnings are much more noisy than kernel oops
>in 2 ways:
>
>1) It's common for a *normal* kernel boot to have a dozen of old
>warnings.
>
>2) Many warnings do not necessarily mean something should or could be
>fixed -- they may well be mentioning some HW problem, or an alert
>message to the user.
>
>So there is a much bigger and messy problem than catching the warnings:
>ways to effectively mark or filter real warnings that automated testing
>should catch and report.
>
>For filtering, we currently accumulated the below blacklist:
>
>https://github.com/fengguang/lkp-tests/blob/master/etc/kmsg-blacklist
>
[...]

For the marking part, I wonder if there can be a clear rule that
allows developers to distinguish 2 kind of information for users and
testers:

- "bug" message: indicating a regression that should be reported and fixed

- "fact" message: well there's a problem, but we kernel developers
  probably can do nothing about it. It's just to let the user know
  about the fact. The fix might be something like replacing a broken
  disk drive.

Those message types are orthogonal to severity of the problem (log
levels), so the current log levels are actually not sufficient for
distinguishing these kind of situations.

Here are the list of warning ids waiting for bisect in 0day test farm.
As you may see, only relatively few ones may worth bisecting and
reporting. We've already built a pretty large blacklist, however it's
proved to be awkward and ineffective -- it obviously suffers from both
coverage and expertise problems.

    777 hub#-#:#:config_failed,can't_get_hub_status(err#)
    328 Firmware_Bug]:TSC_DEADLINE_disabled_due_to_Errata;please_update_microcode_to_version:#(or_later)
     31 ACPI_Error:Field[CPB3]at_bit_offset/length#exceeds_size_of_target_Buffer(#bits)(#/dsopcode-#)
     30 ACPI_Error:Method_parse/execution_failed~_SB.PMI0._PMC,AE_NOT_EXIST(#/psparse-#)
     29 ACPI_Error:Method_parse/execution_failed~_SB.PMI0._GHL,AE_NOT_EXIST(#/psparse-#)
     28 ACPI_Error:Region_IPMI(ID=#)has_no_handler(#/exfldio-#)
     28 ACPI_Error:No_handler_for_Region[SYSI](#)[IPMI](#/evregion-#)
     27 ACPI_Error:Method_parse/execution_failed~_SB._OSC,AE_AML_BUFFER_LIMIT(#/psparse-#)
     25 tpm_tpm#:A_TPM_error(#)occurred_attempting_get_random
     24 IP-Config:Reopening_network_devices
     19 DHCP/BOOTP:Ignoring_fragmented_reply
      8 xhci_hcd#:#:#:init#:#:#fail
      8 xhci_hcd#:#:#:can't_setup
      6 in_atomic():#,irqs_disabled():#,pid:#,name:swapper
      4 megaraid_sas#:#:#:Init_cmd_return_status_SUCCESS_for_SCSI_host
      4 in_atomic():#,irqs_disabled():#,pid:#,name:kworker
      4 drm:drm_atomic_helper_commit_cleanup_done[drm_kms_helper]]*ERROR*[CRTC:#:pipe_A]flip_done_timed_out
      2 print_req_error:I/O_error,dev_loop#,sector
      2 IP-Config:Failed_to_open_erspan0
      2 in_atomic():#,irqs_disabled():#,pid:#,name:systemd-udevd
      2 in_atomic():#,irqs_disabled():#,pid:#,name:perf
      2 acerhdf:unknown(unsupported)BIOS_version_QEMU/Standard_PC(i440FX+PIIX,#).#-#,please_report,aborting
      1 XFS(sda8):metadata_I/O_error:block#(~xfs_readlink_bmap_ilocked~)error#numblks
      1 XFS(sda8):Metadata_CRC_error_detected_at_xfs_symlink_read_verify[xfs],xfs_symlink_block
      1 XFS(sda8):Metadata_CRC_error_detected_at_xfs_dir3_data_read_verify[xfs],xfs_dir3_data_block
      1 XFS(sda8):Metadata_CRC_error_detected_at_xfs_da3_node_read_verify[xfs],xfs_da3_node_block
      1 XFS(sda8):Metadata_CRC_error_detected_at_xfs_attr3_leaf_read_verify[xfs],xfs_attr3_leaf_block
      1 Memory_failure:#:recovery_action_for_huge_page:Recovered
      1 Memory_failure:#:recovery_action_for_clean_LRU_page:Recovered
      1 Memory_failure:#:Killing_tinjpage:#due_to_hardware_memory_corruption
      1 mce:[Hardware_Error]:TSC#MISC#df87b000d9eff
      1 in_atomic():#,irqs_disabled():#,pid:#,name:xargs
      1 in_atomic():#,irqs_disabled():#,pid:#,name:triad_loop
      1 in_atomic():#,irqs_disabled():#,pid:#,name:tchain_edit
      1 in_atomic():#,irqs_disabled():#,pid:#,name:sh
      1 in_atomic():#,irqs_disabled():#,pid:#,name:sed
      1 in_atomic():#,irqs_disabled():#,pid:#,name:run.sh
      1 in_atomic():#,irqs_disabled():#,pid:#,name:perf_test
      1 in_atomic():#,irqs_disabled():#,pid:#,name:noploop.sh
      1 in_atomic():#,irqs_disabled():#,pid:#,name:jbd2/sda1
      1 in_atomic():#,irqs_disabled():#,pid:#,name:grep
      1 in_atomic():#,irqs_disabled():#,pid:#,name:fallocate
      1 in_atomic():#,irqs_disabled():#,pid:#,name:dmesg
      1 in_atomic():#,irqs_disabled():#,pid:#,name:cc1
      1 do_IRQ:#No_irq_handler_for_vector
      1 Buffer_I/O_error_on_dev_dm-#,logical_block#,async_page_read

We already have a well known answer to "marking bugs", ie. to dump a
pile of call trace that no one can tolerate.

The down side is, for some kind of bugs the developer may not need the
call trace at all. And the call trace might make users unnecessarily
nervous. That's why we have this email thread -- to catch regressions
indicated by some 1-liner kernel warnings. For that kind of situation,
it may be enough to add a common prefix for such messages. For example,

        [kernel bug] your warning/error printk message
        ~~~~~~~~~~~~

Alternatively, some printk messages already clearly stated
"please report" (they seem more targeted for users than testers):

wfg /c/linux% git grep '".*please report'|head
block/elevator.c:645:                  "(nr_sorted=%u), please report this\n",
drivers/ata/pata_hpt37x.c:867:                  pr_err("Unknown HPT366 subtype, please report (%d)\n",
drivers/ata/pata_hpt37x.c:908:          pr_err("PCI table is bogus, please report (%d)\n", dev->device);
drivers/ata/pata_hpt3x2n.c:532:         pr_err("PCI table is bogus, please report (%d)\n", dev->device);
drivers/edac/mce_amd.c:725:                      " please report on LKML.\n");
drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:395:                                DRM_INFO("  DDC: no ddc bus - possible BIOS bug - please report to xorg-driver-ati@lists.x.org\n");
drivers/gpu/drm/nouveau/nouveau_bios.c:1458:                                  "please report\n");
drivers/gpu/drm/radeon/radeon_display.c:803:                            DRM_INFO("  DDC: no ddc bus - possible BIOS bug - please report to xorg-driver-ati@lists.x.org\n");
drivers/hwmon/ibmaem.c:773:                             "Unknown AEM v%d; please report this to the maintainer.\n",
drivers/hwmon/w83781d.c:1367:                    "If reset=1 solved a problem you were having, please report!\n");

Thanks,
Fengguang

       reply	other threads:[~2017-11-22  3:27 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAOi56cVORdyjTXK4QGYRfyD1Q=QBgsU3B__gZT0xj6OBKaasLQ@mail.gmail.com>
     [not found] ` <20171122015610.x3kgzqgtwywlurmz@wfg-t540p.sh.intel.com>
2017-11-22  3:27   ` Fengguang Wu [this message]
2017-11-22  5:26     ` kernel CI: printk loglevels in kernel boot logs? Sergey Senozhatsky
2017-11-22 10:42       ` Mark Brown
2017-11-22 11:34     ` Petr Mladek
2017-11-22 12:38       ` Sergey Senozhatsky
2017-11-22 12:52         ` Fengguang Wu
2017-11-23  2:59           ` Sergey Senozhatsky
2017-11-23  3:14             ` Fengguang Wu
2017-11-23  4:31               ` Sergey Senozhatsky
2017-11-29  0:13             ` Kevin Hilman
2017-11-29  7:25               ` Sergey Senozhatsky
2017-11-30 17:45                 ` Kevin Hilman
2017-12-01  1:25                   ` Sergey Senozhatsky
2017-11-23 10:04           ` Petr Mladek
2017-11-22 20:22         ` Kevin Hilman
2017-11-22 14:10       ` Fengguang Wu
2017-12-05 15:55         ` Petr Mladek
2017-12-05 16:13           ` Sergey Senozhatsky
2017-12-05 20:54           ` Steven Rostedt
2017-12-06 13:54             ` Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171122032702.zd5ouuugrxyemqbh@wfg-t540p.sh.intel.com \
    --to=fengguang.wu@intel.com \
    --cc=broonie@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=khilman@baylibre.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).