LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: kernel CI: printk loglevels in kernel boot logs?
       [not found] ` <20171122015610.x3kgzqgtwywlurmz@wfg-t540p.sh.intel.com>
@ 2017-11-22  3:27   ` Fengguang Wu
  2017-11-22  5:26     ` Sergey Senozhatsky
  2017-11-22 11:34     ` Petr Mladek
  0 siblings, 2 replies; 20+ messages in thread
From: Fengguang Wu @ 2017-11-22  3:27 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Mark Brown, Greg Kroah-Hartman, LKML, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Linus Torvalds

[CC LKML for possible printk improvements]

On Wed, Nov 22, 2017 at 09:56:10AM +0800, Fengguang Wu wrote:
>Hi Kevin,
>
>On Tue, Nov 21, 2017 at 12:27:48PM -0800, Kevin Hilman wrote:
>>Hi Fenguang,
>>
>>In automated testing, for ease of parsing kernel boot logs (especially
>>separating warnings and errors from debug, info etc.)
>>
>>Right now we can get this info from a "dmesg --raw" after bootup, but
>>it would be really nice in certain automation frameworks to have a
>>kernel command-line option to enable printing of loglevels in default
>>boot log.
>
>Agreed.
>
>>This is espeically useful when ingesting kernel logs into advanced
>>search/analytics frameworks (I'm playing with and ELK stack: Elastic
>>Search, Logstash, Kibana).
>>
>>Does that sound like a feature you'd be interested in?
>
>Yes, sure.
>
>>In kernelCI, we're considering submitting a patch to add a
>>"show_loglevel" command-line argument to enable that option on kernel
>>boot.
>
>Thanks for doing that patch! It'll obviously make it easier to catch
>various warnings, which will be useful when used with caution,
>especially when false warnings (wrt. real problems that should be
>fixed) can be effectively filtered out.

[...]

>As you may know I'm currently reporting kernel oops in mainline
>kernel, hoping to clear noisy oops there -- they obviously hurt
>catching and bisecting new oops.
>
>I have to say the warnings are much more noisy than kernel oops
>in 2 ways:
>
>1) It's common for a *normal* kernel boot to have a dozen of old
>warnings.
>
>2) Many warnings do not necessarily mean something should or could be
>fixed -- they may well be mentioning some HW problem, or an alert
>message to the user.
>
>So there is a much bigger and messy problem than catching the warnings:
>ways to effectively mark or filter real warnings that automated testing
>should catch and report.
>
>For filtering, we currently accumulated the below blacklist:
>
>https://github.com/fengguang/lkp-tests/blob/master/etc/kmsg-blacklist
>
[...]

For the marking part, I wonder if there can be a clear rule that
allows developers to distinguish 2 kind of information for users and
testers:

- "bug" message: indicating a regression that should be reported and fixed

- "fact" message: well there's a problem, but we kernel developers
  probably can do nothing about it. It's just to let the user know
  about the fact. The fix might be something like replacing a broken
  disk drive.

Those message types are orthogonal to severity of the problem (log
levels), so the current log levels are actually not sufficient for
distinguishing these kind of situations.

Here are the list of warning ids waiting for bisect in 0day test farm.
As you may see, only relatively few ones may worth bisecting and
reporting. We've already built a pretty large blacklist, however it's
proved to be awkward and ineffective -- it obviously suffers from both
coverage and expertise problems.

    777 hub#-#:#:config_failed,can't_get_hub_status(err#)
    328 Firmware_Bug]:TSC_DEADLINE_disabled_due_to_Errata;please_update_microcode_to_version:#(or_later)
     31 ACPI_Error:Field[CPB3]at_bit_offset/length#exceeds_size_of_target_Buffer(#bits)(#/dsopcode-#)
     30 ACPI_Error:Method_parse/execution_failed~_SB.PMI0._PMC,AE_NOT_EXIST(#/psparse-#)
     29 ACPI_Error:Method_parse/execution_failed~_SB.PMI0._GHL,AE_NOT_EXIST(#/psparse-#)
     28 ACPI_Error:Region_IPMI(ID=#)has_no_handler(#/exfldio-#)
     28 ACPI_Error:No_handler_for_Region[SYSI](#)[IPMI](#/evregion-#)
     27 ACPI_Error:Method_parse/execution_failed~_SB._OSC,AE_AML_BUFFER_LIMIT(#/psparse-#)
     25 tpm_tpm#:A_TPM_error(#)occurred_attempting_get_random
     24 IP-Config:Reopening_network_devices
     19 DHCP/BOOTP:Ignoring_fragmented_reply
      8 xhci_hcd#:#:#:init#:#:#fail
      8 xhci_hcd#:#:#:can't_setup
      6 in_atomic():#,irqs_disabled():#,pid:#,name:swapper
      4 megaraid_sas#:#:#:Init_cmd_return_status_SUCCESS_for_SCSI_host
      4 in_atomic():#,irqs_disabled():#,pid:#,name:kworker
      4 drm:drm_atomic_helper_commit_cleanup_done[drm_kms_helper]]*ERROR*[CRTC:#:pipe_A]flip_done_timed_out
      2 print_req_error:I/O_error,dev_loop#,sector
      2 IP-Config:Failed_to_open_erspan0
      2 in_atomic():#,irqs_disabled():#,pid:#,name:systemd-udevd
      2 in_atomic():#,irqs_disabled():#,pid:#,name:perf
      2 acerhdf:unknown(unsupported)BIOS_version_QEMU/Standard_PC(i440FX+PIIX,#).#-#,please_report,aborting
      1 XFS(sda8):metadata_I/O_error:block#(~xfs_readlink_bmap_ilocked~)error#numblks
      1 XFS(sda8):Metadata_CRC_error_detected_at_xfs_symlink_read_verify[xfs],xfs_symlink_block
      1 XFS(sda8):Metadata_CRC_error_detected_at_xfs_dir3_data_read_verify[xfs],xfs_dir3_data_block
      1 XFS(sda8):Metadata_CRC_error_detected_at_xfs_da3_node_read_verify[xfs],xfs_da3_node_block
      1 XFS(sda8):Metadata_CRC_error_detected_at_xfs_attr3_leaf_read_verify[xfs],xfs_attr3_leaf_block
      1 Memory_failure:#:recovery_action_for_huge_page:Recovered
      1 Memory_failure:#:recovery_action_for_clean_LRU_page:Recovered
      1 Memory_failure:#:Killing_tinjpage:#due_to_hardware_memory_corruption
      1 mce:[Hardware_Error]:TSC#MISC#df87b000d9eff
      1 in_atomic():#,irqs_disabled():#,pid:#,name:xargs
      1 in_atomic():#,irqs_disabled():#,pid:#,name:triad_loop
      1 in_atomic():#,irqs_disabled():#,pid:#,name:tchain_edit
      1 in_atomic():#,irqs_disabled():#,pid:#,name:sh
      1 in_atomic():#,irqs_disabled():#,pid:#,name:sed
      1 in_atomic():#,irqs_disabled():#,pid:#,name:run.sh
      1 in_atomic():#,irqs_disabled():#,pid:#,name:perf_test
      1 in_atomic():#,irqs_disabled():#,pid:#,name:noploop.sh
      1 in_atomic():#,irqs_disabled():#,pid:#,name:jbd2/sda1
      1 in_atomic():#,irqs_disabled():#,pid:#,name:grep
      1 in_atomic():#,irqs_disabled():#,pid:#,name:fallocate
      1 in_atomic():#,irqs_disabled():#,pid:#,name:dmesg
      1 in_atomic():#,irqs_disabled():#,pid:#,name:cc1
      1 do_IRQ:#No_irq_handler_for_vector
      1 Buffer_I/O_error_on_dev_dm-#,logical_block#,async_page_read

We already have a well known answer to "marking bugs", ie. to dump a
pile of call trace that no one can tolerate.

The down side is, for some kind of bugs the developer may not need the
call trace at all. And the call trace might make users unnecessarily
nervous. That's why we have this email thread -- to catch regressions
indicated by some 1-liner kernel warnings. For that kind of situation,
it may be enough to add a common prefix for such messages. For example,

        [kernel bug] your warning/error printk message
        ~~~~~~~~~~~~

Alternatively, some printk messages already clearly stated
"please report" (they seem more targeted for users than testers):

wfg /c/linux% git grep '".*please report'|head
block/elevator.c:645:                  "(nr_sorted=%u), please report this\n",
drivers/ata/pata_hpt37x.c:867:                  pr_err("Unknown HPT366 subtype, please report (%d)\n",
drivers/ata/pata_hpt37x.c:908:          pr_err("PCI table is bogus, please report (%d)\n", dev->device);
drivers/ata/pata_hpt3x2n.c:532:         pr_err("PCI table is bogus, please report (%d)\n", dev->device);
drivers/edac/mce_amd.c:725:                      " please report on LKML.\n");
drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:395:                                DRM_INFO("  DDC: no ddc bus - possible BIOS bug - please report to xorg-driver-ati@lists.x.org\n");
drivers/gpu/drm/nouveau/nouveau_bios.c:1458:                                  "please report\n");
drivers/gpu/drm/radeon/radeon_display.c:803:                            DRM_INFO("  DDC: no ddc bus - possible BIOS bug - please report to xorg-driver-ati@lists.x.org\n");
drivers/hwmon/ibmaem.c:773:                             "Unknown AEM v%d; please report this to the maintainer.\n",
drivers/hwmon/w83781d.c:1367:                    "If reset=1 solved a problem you were having, please report!\n");

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-22  3:27   ` kernel CI: printk loglevels in kernel boot logs? Fengguang Wu
@ 2017-11-22  5:26     ` Sergey Senozhatsky
  2017-11-22 10:42       ` Mark Brown
  2017-11-22 11:34     ` Petr Mladek
  1 sibling, 1 reply; 20+ messages in thread
From: Sergey Senozhatsky @ 2017-11-22  5:26 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Linus Torvalds

Hello,

On (11/22/17 11:27), Fengguang Wu wrote:
> [CC LKML for possible printk improvements]

[..]

> > > In kernelCI, we're considering submitting a patch to add a
> > > "show_loglevel" command-line argument to enable that option on kernel
> > > boot.
> > 
> > Thanks for doing that patch! It'll obviously make it easier to catch
> > various warnings, which will be useful when used with caution,
> > especially when false warnings (wrt. real problems that should be
> > fixed) can be effectively filtered out.

both /dev/kmsg and /proc/kmsg show log levels. user space then can
hide those levels or show them via "--raw" mode/etc. we don't (*sort
of don't*) have a "--raw" mode for serial consoles. we do have extended
consoles, which print metadata on consoles: msg_print_ext_header()
and msg_print_ext_body()

	sprintf(... "%u,%llu,%llu,%c;",
		(msg->facility << 3) | msg->level, seq, ts_usec, ....)

basically the same format as for /dev/kmsg. but extended console drivers
come at a price - broken pr_cont(). "sad!". and, besides, may be ext
consoles can show more data than needed.

/proc/kmsg has different format. it does not do the ext_header/ext_body
thing, but instead calls msg_print_text()->print_prefix(), which simply
does

	prefix = (msg->facility << 3) | msg->level;
	sprintf(buf, "<%u>", prefix);

console_unlock(), the function that prints messages on the consoles,
calls msg_print_text() function, but does not do print_prefix()
("bool syslog" is false).

so may be we can add "syslog" mode for console drivers and do print_prefix()
in console_unlock() [unless have nr_ext_console_drivers]. if this is what
you guys are talking about. (frankly, I have no idea if there are tools
that parse output from serial consoles/net consoles and thus might get
confused by extra <%u> in every message).

	-ss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-22  5:26     ` Sergey Senozhatsky
@ 2017-11-22 10:42       ` Mark Brown
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Brown @ 2017-11-22 10:42 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Fengguang Wu, Kevin Hilman, Greg Kroah-Hartman, LKML,
	Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 708 bytes --]

On Wed, Nov 22, 2017 at 02:26:30PM +0900, Sergey Senozhatsky wrote:

> so may be we can add "syslog" mode for console drivers and do print_prefix()
> in console_unlock() [unless have nr_ext_console_drivers]. if this is what
> you guys are talking about. (frankly, I have no idea if there are tools
> that parse output from serial consoles/net consoles and thus might get
> confused by extra <%u> in every message).

In this case the goal is actually to use this with tools parsing the
serial console output - it's in the context of kernelci where the boot
logs are most interesting when we fail to get to the point of being able
to look in /proc.  Kevin's initial proposal was to have a command line
option.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-22  3:27   ` kernel CI: printk loglevels in kernel boot logs? Fengguang Wu
  2017-11-22  5:26     ` Sergey Senozhatsky
@ 2017-11-22 11:34     ` Petr Mladek
  2017-11-22 12:38       ` Sergey Senozhatsky
  2017-11-22 14:10       ` Fengguang Wu
  1 sibling, 2 replies; 20+ messages in thread
From: Petr Mladek @ 2017-11-22 11:34 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML,
	Sergey Senozhatsky, Steven Rostedt, Linus Torvalds

On Wed 2017-11-22 11:27:02, Fengguang Wu wrote:
> [CC LKML for possible printk improvements]
> 
> On Wed, Nov 22, 2017 at 09:56:10AM +0800, Fengguang Wu wrote:
> > Hi Kevin,
> > 
> > On Tue, Nov 21, 2017 at 12:27:48PM -0800, Kevin Hilman wrote:
> > > Hi Fenguang,
> > > 
> > > In automated testing, for ease of parsing kernel boot logs (especially
> > > separating warnings and errors from debug, info etc.)

Would be enough to use console_loglevel to filter out the non-interesting
messages?

> > > Right now we can get this info from a "dmesg --raw" after bootup, but
> > > it would be really nice in certain automation frameworks to have a
> > > kernel command-line option to enable printing of loglevels in default
> > > boot log.
> > 
> > Agreed.
> > 
> > > This is espeically useful when ingesting kernel logs into advanced
> > > search/analytics frameworks (I'm playing with and ELK stack: Elastic
> > > Search, Logstash, Kibana).

This suggests that people are already filtering the kernel log using
external tools. Therefore there is a danger that the format might
have already became the kernel ABI and we could not change it easily.

A solution would be to printk the loglevel after the timestamp.
But this would be ugly because dmesg --raw prints it at the beginning.

To make it clear. I understand that "show_loglevel" command line argument
would be useful for you. But I am afraid that it is not worth changing
the format. There would need to be wide interest into the change.
Also there would need to be evidence that the existing solutions
(dmesg --raw, console_loglevel) are not enough in many real life
scenarios.

> [...]
> 
> > As you may know I'm currently reporting kernel oops in mainline
> > kernel, hoping to clear noisy oops there -- they obviously hurt
> > catching and bisecting new oops.
> > 
> > I have to say the warnings are much more noisy than kernel oops
> > in 2 ways:
> >
> > 1) It's common for a *normal* kernel boot to have a dozen of old
> > warnings.

I guess that you solve this by a "clever" diff between old and new
log. The extra info suggested below might help to filter out
some messages. But the problems will still be there.
I am not sure that the categories are worth it, see below.


> > 2) Many warnings do not necessarily mean something should or could be
> > fixed -- they may well be mentioning some HW problem, or an alert
> > message to the user.
> > 
> > So there is a much bigger and messy problem than catching the warnings:
> > ways to effectively mark or filter real warnings that automated testing
> > should catch and report.
> >
> > For filtering, we currently accumulated the below blacklist:
> > 
> > https://github.com/fengguang/lkp-tests/blob/master/etc/kmsg-blacklist
> > 
> [...]
> 
> For the marking part, I wonder if there can be a clear rule that
> allows developers to distinguish 2 kind of information for users and
> testers:
> 
> - "bug" message: indicating a regression that should be reported and fixed
> 
> - "fact" message: well there's a problem, but we kernel developers
>  probably can do nothing about it. It's just to let the user know
>  about the fact. The fix might be something like replacing a broken
>  disk drive.
> 
> Those message types are orthogonal to severity of the problem (log
> levels), so the current log levels are actually not sufficient for
> distinguishing these kind of situations.

It is interesting idea but I am afraid that it would not work
in reality from several reasons:

  + It is yet another complexity. I am not sure that developers would
    want to think about it when creating new messages.


  + Who would set this for the existing ~100k messages?


  + If I get it correctly, the motivation is to help normal users
    to decide when a kernel bug should get reported. This might
    heavily increase the number of poor bug reports.

    The recent discussion about kernel regression suggests
    that there we are not ready for this, see
    https://lwn.net/Articles/738216/


  + The proposal it rather strigthforward. But does all messages
    clearly fit into one of the two categories? What about
    messages that are not about problems? For example, most
    of pr_info(), pr_notice() messages?


  + Are the two categories worth the effort? Especially when the
    second one might still eventually be a kernel bug?

    If we add more categories, it might get too complicated.
    For example, like setting severities and priorities
    in the classic bugzilla.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-22 11:34     ` Petr Mladek
@ 2017-11-22 12:38       ` Sergey Senozhatsky
  2017-11-22 12:52         ` Fengguang Wu
  2017-11-22 20:22         ` Kevin Hilman
  2017-11-22 14:10       ` Fengguang Wu
  1 sibling, 2 replies; 20+ messages in thread
From: Sergey Senozhatsky @ 2017-11-22 12:38 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Fengguang Wu, Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML,
	Sergey Senozhatsky, Steven Rostedt, Linus Torvalds

On (11/22/17 12:34), Petr Mladek wrote:
[..]
> > > > This is espeically useful when ingesting kernel logs into advanced
> > > > search/analytics frameworks (I'm playing with and ELK stack: Elastic
> > > > Search, Logstash, Kibana).
[..]
> To make it clear. I understand that "show_loglevel" command line argument
> would be useful for you. But I am afraid that it is not worth changing
> the format. There would need to be wide interest into the change.
> Also there would need to be evidence that the existing solutions
> (dmesg --raw, console_loglevel) are not enough in many real life
> scenarios.

well, I think that that "consoles_format=syslog" command line parameter
will be enabled only by those who actually want to have it - Fengguang's
build robot and kernelCI (+ may be more setups).  so I'd probably assume
there are low risks here. may be I'm wrong.

I think it makes sense to have syslog's format "<%u>[timestamp] text\n"
on serial consoles (time stamp when PRINTK_TIME set; <%u> when
consoles_format=syslog set).

	-ss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-22 12:38       ` Sergey Senozhatsky
@ 2017-11-22 12:52         ` Fengguang Wu
  2017-11-23  2:59           ` Sergey Senozhatsky
  2017-11-23 10:04           ` Petr Mladek
  2017-11-22 20:22         ` Kevin Hilman
  1 sibling, 2 replies; 20+ messages in thread
From: Fengguang Wu @ 2017-11-22 12:52 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML,
	Sergey Senozhatsky, Steven Rostedt, Linus Torvalds

On Wed, Nov 22, 2017 at 09:38:21PM +0900, Sergey Senozhatsky wrote:
>On (11/22/17 12:34), Petr Mladek wrote:
>[..]
>> > > > This is espeically useful when ingesting kernel logs into advanced
>> > > > search/analytics frameworks (I'm playing with and ELK stack: Elastic
>> > > > Search, Logstash, Kibana).
>[..]
>> To make it clear. I understand that "show_loglevel" command line argument
>> would be useful for you. But I am afraid that it is not worth changing
>> the format. There would need to be wide interest into the change.
>> Also there would need to be evidence that the existing solutions
>> (dmesg --raw, console_loglevel) are not enough in many real life
>> scenarios.
>
>well, I think that that "consoles_format=syslog" command line parameter
>will be enabled only by those who actually want to have it - Fengguang's
>build robot and kernelCI (+ may be more setups).  so I'd probably assume
>there are low risks here. may be I'm wrong.

Yes, since it needs explicit enabling, local parsers should stay
working. Unless we send dmesg to other developers, but then they
might also receive $(dmesg --raw) outputs and need to handle <%u>
prefixes, too.

>I think it makes sense to have syslog's format "<%u>[timestamp] text\n"
>on serial consoles (time stamp when PRINTK_TIME set; <%u> when
>consoles_format=syslog set).

Thanks. Since the 0day scripts can already parse that format (code
is listed below), we should quickly benefit from that feature when
it's mainlined.

https://github.com/fengguang/lkp-tests/blob/master/lib/dmesg.rb

    oops = `#{grep} -a -E -e '^<[0123]>' -e '^kern  :(err   |crit  |alert |emerg ): ' #{kmsg_file} |
      sed -r 's/\\x1b\\[([0-9;]+m|[mK])//g' |
      grep -a -v -E -f #{LKP_SRC}/etc/oops-pattern |
      grep -a -v -F -f #{LKP_SRC}/etc/kmsg-blacklist`

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-22 11:34     ` Petr Mladek
  2017-11-22 12:38       ` Sergey Senozhatsky
@ 2017-11-22 14:10       ` Fengguang Wu
  2017-12-05 15:55         ` Petr Mladek
  1 sibling, 1 reply; 20+ messages in thread
From: Fengguang Wu @ 2017-11-22 14:10 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML,
	Sergey Senozhatsky, Steven Rostedt, Linus Torvalds

On Wed, Nov 22, 2017 at 12:34:48PM +0100, Petr Mladek wrote:
>On Wed 2017-11-22 11:27:02, Fengguang Wu wrote:
>> [CC LKML for possible printk improvements]
>>
>> On Wed, Nov 22, 2017 at 09:56:10AM +0800, Fengguang Wu wrote:
>> > Hi Kevin,
>> >
>> > On Tue, Nov 21, 2017 at 12:27:48PM -0800, Kevin Hilman wrote:
>> > > Hi Fenguang,
>> > >
>> > > In automated testing, for ease of parsing kernel boot logs (especially
>> > > separating warnings and errors from debug, info etc.)
>
>Would be enough to use console_loglevel to filter out the non-interesting
>messages?

Nah we do prefer and actually require keeping the full dmesg. For one
thing, it's good to have the context information. The 0day test farm
will dump logs and debug info to the serial console. And we'll parse
the normal dmesg lines to detect when the kernel boots up and
shutdown (to split the continuous serial console log into dmesg files
for each kernel boot), when the initramfs is loading (to judge whether
the kernel oops is related to the initrd), when the testing begins,
etc. For embedded systems people may even control machine boot and
collect artifacts via the serial console.

>> > > Right now we can get this info from a "dmesg --raw" after bootup, but
>> > > it would be really nice in certain automation frameworks to have a
>> > > kernel command-line option to enable printing of loglevels in default
>> > > boot log.
>> >
>> > Agreed.
>> >
>> > > This is espeically useful when ingesting kernel logs into advanced
>> > > search/analytics frameworks (I'm playing with and ELK stack: Elastic
>> > > Search, Logstash, Kibana).
>
>This suggests that people are already filtering the kernel log using
>external tools. Therefore there is a danger that the format might
>have already became the kernel ABI and we could not change it easily.
>
>A solution would be to printk the loglevel after the timestamp.
>But this would be ugly because dmesg --raw prints it at the beginning.
>
>To make it clear. I understand that "show_loglevel" command line argument
>would be useful for you. But I am afraid that it is not worth changing
>the format. There would need to be wide interest into the change.
>Also there would need to be evidence that the existing solutions
>(dmesg --raw, console_loglevel) are not enough in many real life
>scenarios.

As Sergey said in the other thread, it should be safe because the
syslog format will be explicitly enabled by those who want and ready
to parse it.

>> [...]
>>
>> > As you may know I'm currently reporting kernel oops in mainline
>> > kernel, hoping to clear noisy oops there -- they obviously hurt
>> > catching and bisecting new oops.
>> >
>> > I have to say the warnings are much more noisy than kernel oops
>> > in 2 ways:
>> >
>> > 1) It's common for a *normal* kernel boot to have a dozen of old
>> > warnings.
>
>I guess that you solve this by a "clever" diff between old and new
>log.

That does help filtering out the majority old warnings. For example,
if it's an inherent HW issue, it may remain there forever and never
show up after diff.

However some noises will still go through and the number is large
enough to make "tester side blacklist" a big maintenance headache.
As you may see from the listing of warning messages waiting for bisect
in 0day now.

>The extra info suggested below might help to filter out
>some messages. But the problems will still be there.
>I am not sure that the categories are worth it, see below.
>
>
>> > 2) Many warnings do not necessarily mean something should or could be
>> > fixed -- they may well be mentioning some HW problem, or an alert
>> > message to the user.
>> >
>> > So there is a much bigger and messy problem than catching the warnings:
>> > ways to effectively mark or filter real warnings that automated testing
>> > should catch and report.
>> >
>> > For filtering, we currently accumulated the below blacklist:
>> >
>> > https://github.com/fengguang/lkp-tests/blob/master/etc/kmsg-blacklist
>> >
>> [...]
>>
>> For the marking part, I wonder if there can be a clear rule that
>> allows developers to distinguish 2 kind of information for users and
>> testers:
>>
>> - "bug" message: indicating a regression that should be reported and fixed
>>
>> - "fact" message: well there's a problem, but we kernel developers
>>  probably can do nothing about it. It's just to let the user know
>>  about the fact. The fix might be something like replacing a broken
>>  disk drive.
>>
>> Those message types are orthogonal to severity of the problem (log
>> levels), so the current log levels are actually not sufficient for
>> distinguishing these kind of situations.
>
>It is interesting idea but I am afraid that it would not work
>in reality from several reasons:
>
>  + It is yet another complexity. I am not sure that developers would
>    want to think about it when creating new messages.

[moved ahead the below 2 points]
>  + The proposal it rather strigthforward. But does all messages
>    clearly fit into one of the two categories? What about
>    messages that are not about problems? For example, most
>    of pr_info(), pr_notice() messages?
>
>
>  + Are the two categories worth the effort? Especially when the
>    second one might still eventually be a kernel bug?
>
>    If we add more categories, it might get too complicated.
>    For example, like setting severities and priorities
>    in the classic bugzilla.
>

Let me answer the 3 points together. Sorry if I was not clear -- I
merely give the 2 categories as example. However the key point is not
about creating categories. My proposal would be something like this:

        #define KERNEL_BUG "[kernel bug]"

Then use that tag wherever maintainers want to track regressions:

        pr_err(KERNEL_BUG "alg: Unexpected test result for %s: %d\n", name, err);
               ~~~~~~~~~~

If such tag is widely used, it effectively enables whitelist based
kernel warning regression tracking, comparing to the currently messy
and ineffective blacklist:

https://github.com/fengguang/lkp-tests/blob/master/etc/kmsg-blacklist

>  + Who would set this for the existing ~100k messages?

That's a good question and main challenge.

Perhaps the modules under active development and have the desire to
track regressions will be more willing to add the tag.

>  + If I get it correctly, the motivation is to help normal users
>    to decide when a kernel bug should get reported. This might

My purpose is mainly for using the kernel bug tag for auto regression
tracking in test infrastructures. Including distro's dmesg monitoring
and auto reporting tools which could pop up a window saying "I find a
bug in your dmesg, shall I auto upload it for analyze?"

But yeah, it'll be useful for end users too -- now they'll be easier
to tell when there are kernel bugs.

>    heavily increase the number of poor bug reports.
>
>    The recent discussion about kernel regression suggests
>    that there we are not ready for this, see
>    https://lwn.net/Articles/738216/

Agreed. It'll be good if we can have some user report aggregation
service setup first.

I'd imagine that service to start with a big search box where a user
can paste his dmesg. The service then parses the dmesg and help check
whether it's a new bug, or looks similar to some already reported one.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-22 12:38       ` Sergey Senozhatsky
  2017-11-22 12:52         ` Fengguang Wu
@ 2017-11-22 20:22         ` Kevin Hilman
  1 sibling, 0 replies; 20+ messages in thread
From: Kevin Hilman @ 2017-11-22 20:22 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Fengguang Wu, Mark Brown, Greg Kroah-Hartman, LKML,
	Sergey Senozhatsky, Steven Rostedt, Linus Torvalds

On Wed, Nov 22, 2017 at 4:38 AM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
> On (11/22/17 12:34), Petr Mladek wrote:
> [..]
>> > > > This is espeically useful when ingesting kernel logs into advanced
>> > > > search/analytics frameworks (I'm playing with and ELK stack: Elastic
>> > > > Search, Logstash, Kibana).
> [..]
>> To make it clear. I understand that "show_loglevel" command line argument
>> would be useful for you. But I am afraid that it is not worth changing
>> the format. There would need to be wide interest into the change.
>> Also there would need to be evidence that the existing solutions
>> (dmesg --raw, console_loglevel) are not enough in many real life
>> scenarios.
>
> well, I think that that "consoles_format=syslog" command line parameter
> will be enabled only by those who actually want to have it - Fengguang's
> build robot and kernelCI (+ may be more setups).  so I'd probably assume
> there are low risks here. may be I'm wrong.

Agreed.  The goal of the command-line parameter is so that only
automation frameworks that would use it.

The other important reason for having this on the command line is that
for testing linux-next (and other bleeding edge developer branches),
it's common that we never make it to userspace, so can't even run
"dmesg --raw" (or equivalent.)  So we really want this on the primary
boot (serial) console.

> I think it makes sense to have syslog's format "<%u>[timestamp] text\n"
> on serial consoles (time stamp when PRINTK_TIME set; <%u> when
> consoles_format=syslog set).

Yes, I think it should match "dmesg --raw".

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-22 12:52         ` Fengguang Wu
@ 2017-11-23  2:59           ` Sergey Senozhatsky
  2017-11-23  3:14             ` Fengguang Wu
  2017-11-29  0:13             ` Kevin Hilman
  2017-11-23 10:04           ` Petr Mladek
  1 sibling, 2 replies; 20+ messages in thread
From: Sergey Senozhatsky @ 2017-11-23  2:59 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Sergey Senozhatsky, Petr Mladek, Kevin Hilman, Mark Brown,
	Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds

On (11/22/17 20:52), Fengguang Wu wrote:
[..]
> > well, I think that that "consoles_format=syslog" command line parameter
> > will be enabled only by those who actually want to have it - Fengguang's
> > build robot and kernelCI (+ may be more setups).  so I'd probably assume
> > there are low risks here. may be I'm wrong.
> 
> Yes, since it needs explicit enabling, local parsers should stay
> working. Unless we send dmesg to other developers, but then they
> might also receive $(dmesg --raw) outputs and need to handle <%u>
> prefixes, too.
> 
> > I think it makes sense to have syslog's format "<%u>[timestamp] text\n"
> > on serial consoles (time stamp when PRINTK_TIME set; <%u> when
> > consoles_format=syslog set).
> 
> Thanks. Since the 0day scripts can already parse that format (code
> is listed below), we should quickly benefit from that feature when
> it's mainlined.

okay... who's going to send the patch? kernelCI folks?

I have some sort of a patch. added console_msg_format= with the only
available option so far - "syslog". may be people would want to have
boot time, etc. on their consoles. who knows. may be msg_format will
be extended someday in the future or may be not, and we can do
something like "console_format_syslog" (or any better name).

Not-yet-signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++
 kernel/printk/printk.c                          | 21 ++++++++++++++++++++-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 62436bd5f34a..42113e5181b2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -640,6 +640,17 @@
 			console=brl,ttyS0
 		For now, only VisioBraille is supported.
 
+	console_msg_format=
+			[KNL] Control message format
+			By default we print messages in "[time stamp] text\n"
+			format (time stamp may not be printed, depending on
+			CONFIG_PRINTK_TIME or `printk_time' param).
+		syslog
+			Switch to syslog format (similar to "dmesg --raw" or
+			reading from /proc/kmsg): "<%u>[time stamp] text\n"
+			IOW, each message is getting prepended with the
+			facility and loglevel prefix.
+
 	consoleblank=	[KNL] The console blank (screen saver) timeout in
 			seconds. A value of 0 disables the blank timer.
                        Defaults to 0.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5d81206a572d..034f3ad85f5b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -277,6 +277,13 @@ EXPORT_SYMBOL(console_set_on_cmdline);
 /* Flag: console code may call schedule() */
 static int console_may_schedule;
 
+enum con_msg_format {
+	MSG_FORMAT_DEFAULT = 1,
+	MSG_FORMAT_SYSLOG = 2,
+};
+
+static int console_msg_format = MSG_FORMAT_DEFAULT;
+
 /*
  * The printk log buffer consists of a chain of concatenated variable
  * length records. Every record starts with a record header, containing
@@ -1913,6 +1920,15 @@ static int __add_preferred_console(char *name, int idx, char *options,
 	c->index = idx;
 	return 0;
 }
+
+static int __init console_msg_format_setup(char *str)
+{
+	if (!strncmp(str, "syslog", 6))
+		console_msg_format = MSG_FORMAT_SYSLOG;
+	return 1;
+}
+__setup("console_msg_format=", console_msg_format_setup);
+
 /*
  * Set up a console.  Called via do_early_param() in init/main.c
  * for each "console=" parameter in the boot command line.
@@ -2215,7 +2231,10 @@ void console_unlock(void)
 			goto skip;
 		}
 
-		len += msg_print_text(msg, false, text + len, sizeof(text) - len);
+		len += msg_print_text(msg,
+				console_msg_format & MSG_FORMAT_SYSLOG,
+				text + len,
+				sizeof(text) - len);
 		if (nr_ext_console_drivers) {
 			ext_len = msg_print_ext_header(ext_text,
 						sizeof(ext_text),
-- 
2.15.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-23  2:59           ` Sergey Senozhatsky
@ 2017-11-23  3:14             ` Fengguang Wu
  2017-11-23  4:31               ` Sergey Senozhatsky
  2017-11-29  0:13             ` Kevin Hilman
  1 sibling, 1 reply; 20+ messages in thread
From: Fengguang Wu @ 2017-11-23  3:14 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML,
	Sergey Senozhatsky, Steven Rostedt, Linus Torvalds

On Thu, Nov 23, 2017 at 11:59:57AM +0900, Sergey Senozhatsky wrote:
>On (11/22/17 20:52), Fengguang Wu wrote:
>[..]
>> > well, I think that that "consoles_format=syslog" command line parameter
>> > will be enabled only by those who actually want to have it - Fengguang's
>> > build robot and kernelCI (+ may be more setups).  so I'd probably assume
>> > there are low risks here. may be I'm wrong.
>>
>> Yes, since it needs explicit enabling, local parsers should stay
>> working. Unless we send dmesg to other developers, but then they
>> might also receive $(dmesg --raw) outputs and need to handle <%u>
>> prefixes, too.
>>
>> > I think it makes sense to have syslog's format "<%u>[timestamp] text\n"
>> > on serial consoles (time stamp when PRINTK_TIME set; <%u> when
>> > consoles_format=syslog set).
>>
>> Thanks. Since the 0day scripts can already parse that format (code
>> is listed below), we should quickly benefit from that feature when
>> it's mainlined.
>
>okay... who's going to send the patch? kernelCI folks?
>
>I have some sort of a patch. added console_msg_format= with the only

The patch looks good to me. Perhaps you can send it out after getting
review comments from kernelci folks?

Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>

Thanks,
Fengguang

>available option so far - "syslog". may be people would want to have
>boot time, etc. on their consoles. who knows. may be msg_format will
>be extended someday in the future or may be not, and we can do
>something like "console_format_syslog" (or any better name).
>
>Not-yet-signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
>---
> Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++
> kernel/printk/printk.c                          | 21 ++++++++++++++++++++-
> 2 files changed, 31 insertions(+), 1 deletion(-)
>
>diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>index 62436bd5f34a..42113e5181b2 100644
>--- a/Documentation/admin-guide/kernel-parameters.txt
>+++ b/Documentation/admin-guide/kernel-parameters.txt
>@@ -640,6 +640,17 @@
> 			console=brl,ttyS0
> 		For now, only VisioBraille is supported.
>
>+	console_msg_format=
>+			[KNL] Control message format
>+			By default we print messages in "[time stamp] text\n"
>+			format (time stamp may not be printed, depending on
>+			CONFIG_PRINTK_TIME or `printk_time' param).
>+		syslog
>+			Switch to syslog format (similar to "dmesg --raw" or
>+			reading from /proc/kmsg): "<%u>[time stamp] text\n"
>+			IOW, each message is getting prepended with the
>+			facility and loglevel prefix.
>+
> 	consoleblank=	[KNL] The console blank (screen saver) timeout in
> 			seconds. A value of 0 disables the blank timer.
>                        Defaults to 0.
>diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>index 5d81206a572d..034f3ad85f5b 100644
>--- a/kernel/printk/printk.c
>+++ b/kernel/printk/printk.c
>@@ -277,6 +277,13 @@ EXPORT_SYMBOL(console_set_on_cmdline);
> /* Flag: console code may call schedule() */
> static int console_may_schedule;
>
>+enum con_msg_format {
>+	MSG_FORMAT_DEFAULT = 1,
>+	MSG_FORMAT_SYSLOG = 2,
>+};
>+
>+static int console_msg_format = MSG_FORMAT_DEFAULT;
>+
> /*
>  * The printk log buffer consists of a chain of concatenated variable
>  * length records. Every record starts with a record header, containing
>@@ -1913,6 +1920,15 @@ static int __add_preferred_console(char *name, int idx, char *options,
> 	c->index = idx;
> 	return 0;
> }
>+
>+static int __init console_msg_format_setup(char *str)
>+{
>+	if (!strncmp(str, "syslog", 6))
>+		console_msg_format = MSG_FORMAT_SYSLOG;
>+	return 1;
>+}
>+__setup("console_msg_format=", console_msg_format_setup);
>+
> /*
>  * Set up a console.  Called via do_early_param() in init/main.c
>  * for each "console=" parameter in the boot command line.
>@@ -2215,7 +2231,10 @@ void console_unlock(void)
> 			goto skip;
> 		}
>
>-		len += msg_print_text(msg, false, text + len, sizeof(text) - len);
>+		len += msg_print_text(msg,
>+				console_msg_format & MSG_FORMAT_SYSLOG,
>+				text + len,
>+				sizeof(text) - len);
> 		if (nr_ext_console_drivers) {
> 			ext_len = msg_print_ext_header(ext_text,
> 						sizeof(ext_text),
>-- 
>2.15.0
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-23  3:14             ` Fengguang Wu
@ 2017-11-23  4:31               ` Sergey Senozhatsky
  0 siblings, 0 replies; 20+ messages in thread
From: Sergey Senozhatsky @ 2017-11-23  4:31 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Sergey Senozhatsky, Petr Mladek, Kevin Hilman, Mark Brown,
	Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds

Hello,

On (11/23/17 11:14), Fengguang Wu wrote:
> > > Thanks. Since the 0day scripts can already parse that format (code
> > > is listed below), we should quickly benefit from that feature when
> > > it's mainlined.
> > 
> > okay... who's going to send the patch? kernelCI folks?
> > 
> > I have some sort of a patch. added console_msg_format= with the only
> 
> The patch looks good to me. Perhaps you can send it out after getting
> review comments from kernelci folks?

agree, would be lovely to hear more opinions/etc.

and I need some help with the credits (Suggested-by: ? and so on).

> Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>

thanks.

	-ss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-22 12:52         ` Fengguang Wu
  2017-11-23  2:59           ` Sergey Senozhatsky
@ 2017-11-23 10:04           ` Petr Mladek
  1 sibling, 0 replies; 20+ messages in thread
From: Petr Mladek @ 2017-11-23 10:04 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Sergey Senozhatsky, Kevin Hilman, Mark Brown, Greg Kroah-Hartman,
	LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds

On Wed 2017-11-22 20:52:45, Fengguang Wu wrote:
> On Wed, Nov 22, 2017 at 09:38:21PM +0900, Sergey Senozhatsky wrote:
> > On (11/22/17 12:34), Petr Mladek wrote:
> > [..]
> > > > > > This is espeically useful when ingesting kernel logs into advanced
> > > > > > search/analytics frameworks (I'm playing with and ELK stack: Elastic
> > > > > > Search, Logstash, Kibana).
> > [..]
> > > To make it clear. I understand that "show_loglevel" command line argument
> > > would be useful for you. But I am afraid that it is not worth changing
> > > the format. There would need to be wide interest into the change.
> > > Also there would need to be evidence that the existing solutions
> > > (dmesg --raw, console_loglevel) are not enough in many real life
> > > scenarios.
> > 
> > well, I think that that "consoles_format=syslog" command line parameter
> > will be enabled only by those who actually want to have it - Fengguang's
> > build robot and kernelCI (+ may be more setups).  so I'd probably assume
> > there are low risks here. may be I'm wrong.
> 
> Yes, since it needs explicit enabling, local parsers should stay
> working. Unless we send dmesg to other developers, but then they
> might also receive $(dmesg --raw) outputs and need to handle <%u>
> prefixes, too.

I would have agreed with this few days ago. But I am not sure now.

We tried to upstream a feature that allowed to use different clocks
for the printk timestamps. The current one is local clock. Some
people were interested into the real time clock that would allow
to match messages from userspace. Some other users were interested
into boot time clock that would count also the time when the system
is suspended.

We made this configurable (config option + command line option).
I thought that it was safe because local clock stayed default
and the other possibilities would be used just for debugging.

Linus rejected this, see
https://lkml.kernel.org/r/CA+55aFwUfA__6MgMgjENnx+_RYY2ZOOLiSx2ea1AvYhSZN+78A@mail.gmail.com
https://lkml.kernel.org/r/CA+55aFzLH9crdMtUFkD-PtNGuxu_fsG5GH2ACni69ug9iM=09g@mail.gmail.com
https://lkml.kernel.org/r/CA+55aFzzvBD4_WYm-5Bm7TeRGR8nNu1oz0oWNcRNmTNJm=M4Lw@mail.gmail.com


The feature with timestamps was more complicated. The different time
sources had problems on its own. We had troubles to suggest a good
one. Therefore it was wrong to just move the complex decision to
a poor user.

But one thing is similar. Both features change to format/semantic
of the printed lines (extra field for message level vs. semantic
of the timestamp). In case of the timestamps, Linus was aware
o f a clear userspace dependency (systemd expected time since boot).
The question if there is a user space application that depends
on the message format on the consoles. I have no idea. I would
imagine that there are some system monitors that expect
the strings starting with the timestamp.

You might argue that this will get disabled by default. But
the local clock stayed default in the above feature as well.


One thing is that there is CONFIG_PRINTK_TIME and printk.time
command line option. Therefore the format already is kind
of configurable. But I doubt that this is disabled on
any sane system.


Anyway, I would like to wait a bit for the discussion
about the timestamps feature before we add this
console_format stuff. We still need to discuss
how to pass the timestamps to the user and I would
like to hear Linus' opinion there.

You might want to follow/join the discussion around
alternative approach by Thomas Gleixner, see
https://lkml.kernel.org/r/20171115181531.322572387@linutronix.de
It does not solve the userspace side at the moment.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-23  2:59           ` Sergey Senozhatsky
  2017-11-23  3:14             ` Fengguang Wu
@ 2017-11-29  0:13             ` Kevin Hilman
  2017-11-29  7:25               ` Sergey Senozhatsky
  1 sibling, 1 reply; 20+ messages in thread
From: Kevin Hilman @ 2017-11-29  0:13 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Fengguang Wu, Petr Mladek, Mark Brown, Greg Kroah-Hartman, LKML,
	Sergey Senozhatsky, Steven Rostedt, Linus Torvalds

Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> writes:

> On (11/22/17 20:52), Fengguang Wu wrote:
> [..]
>> > well, I think that that "consoles_format=syslog" command line parameter
>> > will be enabled only by those who actually want to have it - Fengguang's
>> > build robot and kernelCI (+ may be more setups).  so I'd probably assume
>> > there are low risks here. may be I'm wrong.
>> 
>> Yes, since it needs explicit enabling, local parsers should stay
>> working. Unless we send dmesg to other developers, but then they
>> might also receive $(dmesg --raw) outputs and need to handle <%u>
>> prefixes, too.
>> 
>> > I think it makes sense to have syslog's format "<%u>[timestamp] text\n"
>> > on serial consoles (time stamp when PRINTK_TIME set; <%u> when
>> > consoles_format=syslog set).
>> 
>> Thanks. Since the 0day scripts can already parse that format (code
>> is listed below), we should quickly benefit from that feature when
>> it's mainlined.
>
> okay... who's going to send the patch? kernelCI folks?
>
> I have some sort of a patch. added console_msg_format= with the only
> available option so far - "syslog". may be people would want to have
> boot time, etc. on their consoles. who knows. may be msg_format will
> be extended someday in the future or may be not, and we can do
> something like "console_format_syslog" (or any better name).
>
> Not-yet-signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

For kernelCI (part of how this thread got started):

Acked-by: Kevin Hilman <khilman@baylibre.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>

I also tested it on an arm64 platform with the new command-line arg:

Tested-by: Kevin Hilman <khilman@baylibre.com>

Thanks,

Kevin

> ---
>  Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++
>  kernel/printk/printk.c                          | 21 ++++++++++++++++++++-
>  2 files changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 62436bd5f34a..42113e5181b2 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -640,6 +640,17 @@
>  			console=brl,ttyS0
>  		For now, only VisioBraille is supported.
>  
> +	console_msg_format=
> +			[KNL] Control message format
> +			By default we print messages in "[time stamp] text\n"
> +			format (time stamp may not be printed, depending on
> +			CONFIG_PRINTK_TIME or `printk_time' param).
> +		syslog
> +			Switch to syslog format (similar to "dmesg --raw" or
> +			reading from /proc/kmsg): "<%u>[time stamp] text\n"
> +			IOW, each message is getting prepended with the
> +			facility and loglevel prefix.
> +
>  	consoleblank=	[KNL] The console blank (screen saver) timeout in
>  			seconds. A value of 0 disables the blank timer.
>                         Defaults to 0.
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 5d81206a572d..034f3ad85f5b 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -277,6 +277,13 @@ EXPORT_SYMBOL(console_set_on_cmdline);
>  /* Flag: console code may call schedule() */
>  static int console_may_schedule;
>  
> +enum con_msg_format {
> +	MSG_FORMAT_DEFAULT = 1,
> +	MSG_FORMAT_SYSLOG = 2,
> +};
> +
> +static int console_msg_format = MSG_FORMAT_DEFAULT;
> +
>  /*
>   * The printk log buffer consists of a chain of concatenated variable
>   * length records. Every record starts with a record header, containing
> @@ -1913,6 +1920,15 @@ static int __add_preferred_console(char *name, int idx, char *options,
>  	c->index = idx;
>  	return 0;
>  }
> +
> +static int __init console_msg_format_setup(char *str)
> +{
> +	if (!strncmp(str, "syslog", 6))
> +		console_msg_format = MSG_FORMAT_SYSLOG;
> +	return 1;
> +}
> +__setup("console_msg_format=", console_msg_format_setup);
> +
>  /*
>   * Set up a console.  Called via do_early_param() in init/main.c
>   * for each "console=" parameter in the boot command line.
> @@ -2215,7 +2231,10 @@ void console_unlock(void)
>  			goto skip;
>  		}
>  
> -		len += msg_print_text(msg, false, text + len, sizeof(text) - len);
> +		len += msg_print_text(msg,
> +				console_msg_format & MSG_FORMAT_SYSLOG,
> +				text + len,
> +				sizeof(text) - len);
>  		if (nr_ext_console_drivers) {
>  			ext_len = msg_print_ext_header(ext_text,
>  						sizeof(ext_text),

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-29  0:13             ` Kevin Hilman
@ 2017-11-29  7:25               ` Sergey Senozhatsky
  2017-11-30 17:45                 ` Kevin Hilman
  0 siblings, 1 reply; 20+ messages in thread
From: Sergey Senozhatsky @ 2017-11-29  7:25 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Sergey Senozhatsky, Fengguang Wu, Petr Mladek, Mark Brown,
	Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds

On (11/28/17 16:13), Kevin Hilman wrote:
[..]
> > okay... who's going to send the patch? kernelCI folks?
> >
> > I have some sort of a patch. added console_msg_format= with the only
> > available option so far - "syslog". may be people would want to have
> > boot time, etc. on their consoles. who knows. may be msg_format will
> > be extended someday in the future or may be not, and we can do
> > something like "console_format_syslog" (or any better name).
> >
> > Not-yet-signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> 
> For kernelCI (part of how this thread got started):
> 
> Acked-by: Kevin Hilman <khilman@baylibre.com>
> Reviewed-by: Kevin Hilman <khilman@baylibre.com>
> 
> I also tested it on an arm64 platform with the new command-line arg:
> 
> Tested-by: Kevin Hilman <khilman@baylibre.com>

thanks.

	-ss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-29  7:25               ` Sergey Senozhatsky
@ 2017-11-30 17:45                 ` Kevin Hilman
  2017-12-01  1:25                   ` Sergey Senozhatsky
  0 siblings, 1 reply; 20+ messages in thread
From: Kevin Hilman @ 2017-11-30 17:45 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Fengguang Wu, Petr Mladek, Mark Brown, Greg Kroah-Hartman, LKML,
	Sergey Senozhatsky, Steven Rostedt, Linus Torvalds

Hi Sergey,

On Tue, Nov 28, 2017 at 11:25 PM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
> On (11/28/17 16:13), Kevin Hilman wrote:
> [..]
>> > okay... who's going to send the patch? kernelCI folks?
>> >
>> > I have some sort of a patch. added console_msg_format= with the only
>> > available option so far - "syslog". may be people would want to have
>> > boot time, etc. on their consoles. who knows. may be msg_format will
>> > be extended someday in the future or may be not, and we can do
>> > something like "console_format_syslog" (or any better name).
>> >
>> > Not-yet-signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
>>
>> For kernelCI (part of how this thread got started):
>>
>> Acked-by: Kevin Hilman <khilman@baylibre.com>
>> Reviewed-by: Kevin Hilman <khilman@baylibre.com>
>>
>> I also tested it on an arm64 platform with the new command-line arg:
>>
>> Tested-by: Kevin Hilman <khilman@baylibre.com>
>
> thanks.

So do you plan to submit an official signed-off version of this?
We're ready to use it ASAP in kernelCI.

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-30 17:45                 ` Kevin Hilman
@ 2017-12-01  1:25                   ` Sergey Senozhatsky
  0 siblings, 0 replies; 20+ messages in thread
From: Sergey Senozhatsky @ 2017-12-01  1:25 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Sergey Senozhatsky, Fengguang Wu, Petr Mladek, Mark Brown,
	Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds

Hello,

On (11/30/17 09:45), Kevin Hilman wrote:
[..]
> >> For kernelCI (part of how this thread got started):
> >>
> >> Acked-by: Kevin Hilman <khilman@baylibre.com>
> >> Reviewed-by: Kevin Hilman <khilman@baylibre.com>
> >>
> >> I also tested it on an arm64 platform with the new command-line arg:
> >>
> >> Tested-by: Kevin Hilman <khilman@baylibre.com>
> >
> > thanks.
> 
> So do you plan to submit an official signed-off version of this?
> We're ready to use it ASAP in kernelCI.

yes, will do.

	-ss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-11-22 14:10       ` Fengguang Wu
@ 2017-12-05 15:55         ` Petr Mladek
  2017-12-05 16:13           ` Sergey Senozhatsky
  2017-12-05 20:54           ` Steven Rostedt
  0 siblings, 2 replies; 20+ messages in thread
From: Petr Mladek @ 2017-12-05 15:55 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML,
	Sergey Senozhatsky, Steven Rostedt, Linus Torvalds

On Wed 2017-11-22 22:10:58, Fengguang Wu wrote:
> On Wed, Nov 22, 2017 at 12:34:48PM +0100, Petr Mladek wrote:
> > On Wed 2017-11-22 11:27:02, Fengguang Wu wrote:
> > > [CC LKML for possible printk improvements]
> > > 
> > > On Wed, Nov 22, 2017 at 09:56:10AM +0800, Fengguang Wu wrote:
> > > > Hi Kevin,
> > > >
> > > > On Tue, Nov 21, 2017 at 12:27:48PM -0800, Kevin Hilman wrote:
> > > > > Hi Fenguang,
> > > > >
> > > > > In automated testing, for ease of parsing kernel boot logs (especially
> > > > > separating warnings and errors from debug, info etc.)
> > 
> > Would be enough to use console_loglevel to filter out the non-interesting
> > messages?
> 
> Nah we do prefer and actually require keeping the full dmesg. For one
> thing, it's good to have the context information. The 0day test farm
> will dump logs and debug info to the serial console. And we'll parse
> the normal dmesg lines to detect when the kernel boots up and
> shutdown (to split the continuous serial console log into dmesg files
> for each kernel boot), when the initramfs is loading (to judge whether
> the kernel oops is related to the initrd), when the testing begins,
> etc. For embedded systems people may even control machine boot and
> collect artifacts via the serial console.

OK, I understand why you would need this feature.

> > > > > Right now we can get this info from a "dmesg --raw" after bootup, but
> > > > > it would be really nice in certain automation frameworks to have a
> > > > > kernel command-line option to enable printing of loglevels in default
> > > > > boot log.
> > > >
> > > > Agreed.
> > > >
> > > > > This is espeically useful when ingesting kernel logs into advanced
> > > > > search/analytics frameworks (I'm playing with and ELK stack: Elastic
> > > > > Search, Logstash, Kibana).
> > 
> > This suggests that people are already filtering the kernel log using
> > external tools. Therefore there is a danger that the format might
> > have already became the kernel ABI and we could not change it easily.
> > 
> > A solution would be to printk the loglevel after the timestamp.
> > But this would be ugly because dmesg --raw prints it at the beginning.
> > 
> > To make it clear. I understand that "show_loglevel" command line argument
> > would be useful for you. But I am afraid that it is not worth changing
> > the format. There would need to be wide interest into the change.
> > Also there would need to be evidence that the existing solutions
> > (dmesg --raw, console_loglevel) are not enough in many real life
> > scenarios.
> 
> As Sergey said in the other thread, it should be safe because the
> syslog format will be explicitly enabled by those who want and ready
> to parse it.

I would have agreed with this argument few weeks ago but I do not get
it any longer. If there are tools depending on the message format
and people enable the extended format, the tools might stop working.

You might argue that these people are shooting into their legs.
But other people might argue that kernel changed ABI and broke
userspace tools. Note that these tools do not have any chance
to force kernel using the old format, they are consumers.

It might be especially annoying if these tools are part of some
complex infrastructure that is used to access the boot logs
and admins get blind. Then it might be hard for them to debug
the problem, especially if they got this feature as part of
a bigger system update.

Am I too paranoid? Steven? Linus?

> > > [...]
> > > 
> > > > As you may know I'm currently reporting kernel oops in mainline
> > > > kernel, hoping to clear noisy oops there -- they obviously hurt
> > > > catching and bisecting new oops.
> > > >
> > > > I have to say the warnings are much more noisy than kernel oops
> > > > in 2 ways:
> > > >
> > > > 1) It's common for a *normal* kernel boot to have a dozen of old
> > > > warnings.
> > 
> > I guess that you solve this by a "clever" diff between old and new
> > log.
> 
> That does help filtering out the majority old warnings. For example,
> if it's an inherent HW issue, it may remain there forever and never
> show up after diff.
> 
> However some noises will still go through and the number is large
> enough to make "tester side blacklist" a big maintenance headache.
> As you may see from the listing of warning messages waiting for bisect
> in 0day now.
> 
> > The extra info suggested below might help to filter out
> > some messages. But the problems will still be there.
> > I am not sure that the categories are worth it, see below.
> > 
> > 
> > > > 2) Many warnings do not necessarily mean something should or could be
> > > > fixed -- they may well be mentioning some HW problem, or an alert
> > > > message to the user.
> > > >
> > > > So there is a much bigger and messy problem than catching the warnings:
> > > > ways to effectively mark or filter real warnings that automated testing
> > > > should catch and report.
> > > >
> > > > For filtering, we currently accumulated the below blacklist:
> > > >
> > > > https://github.com/fengguang/lkp-tests/blob/master/etc/kmsg-blacklist
> > > >
> > > [...]
> > > 
> > > For the marking part, I wonder if there can be a clear rule that
> > > allows developers to distinguish 2 kind of information for users and
> > > testers:
> > > 
> > > - "bug" message: indicating a regression that should be reported and fixed
> > > 
> > > - "fact" message: well there's a problem, but we kernel developers
> > >  probably can do nothing about it. It's just to let the user know
> > >  about the fact. The fix might be something like replacing a broken
> > >  disk drive.
> > > 
> > > Those message types are orthogonal to severity of the problem (log
> > > levels), so the current log levels are actually not sufficient for
> > > distinguishing these kind of situations.
> > 
> > It is interesting idea but I am afraid that it would not work
> > in reality from several reasons:
> > 
> >  + It is yet another complexity. I am not sure that developers would
> >    want to think about it when creating new messages.
> 
> [moved ahead the below 2 points]
> >  + The proposal it rather strigthforward. But does all messages
> >    clearly fit into one of the two categories? What about
> >    messages that are not about problems? For example, most
> >    of pr_info(), pr_notice() messages?
> > 
> > 
> >  + Are the two categories worth the effort? Especially when the
> >    second one might still eventually be a kernel bug?
> > 
> >    If we add more categories, it might get too complicated.
> >    For example, like setting severities and priorities
> >    in the classic bugzilla.
> > 
> 
> Let me answer the 3 points together. Sorry if I was not clear -- I
> merely give the 2 categories as example. However the key point is not
> about creating categories. My proposal would be something like this:
> 
>        #define KERNEL_BUG "[kernel bug]"
> 
> Then use that tag wherever maintainers want to track regressions:
> 
>        pr_err(KERNEL_BUG "alg: Unexpected test result for %s: %d\n", name, err);

We already have something like this, see include/linux/printk.h

#define FW_BUG		"[Firmware Bug]: "
#define FW_WARN		"[Firmware Warn]: "
#define FW_INFO		"[Firmware Info]: "
#define HW_ERR		"[Hardware Error]: "

IMHO, the last one is a good example. It was added 7 years ago
by the commit c6de9f08912311ddc1 ("x86, mce: Add HW_ERR printk prefix
for hardware error logging" and it is still used only in two
mce-related files:

arch/x86/kernel/cpu/mcheck/mce.c
drivers/edac/mce_amd.c

I guess that this prefix would make sense in many more messages
but people do not use it. IMHO, they are either not aware of it
or they simply do not care.

IMHO, this is a good example that this approach does not work.

It seems that Andrew shared my skepticism when HW_ERR was introduced.
See the sentence "There have been various proposals to improve
the usefulness and to rationally categorise things in way which are
more useful to operators, but nothing seems to ever get over the
line." at
https://lkml.kernel.org/r/20100610164607.b2908334.akpm@linux-foundation.org

IMHO, we would need to come up with another approach that would
be easy to understand and could be "more" enforced.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-12-05 15:55         ` Petr Mladek
@ 2017-12-05 16:13           ` Sergey Senozhatsky
  2017-12-05 20:54           ` Steven Rostedt
  1 sibling, 0 replies; 20+ messages in thread
From: Sergey Senozhatsky @ 2017-12-05 16:13 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Fengguang Wu, Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML,
	Sergey Senozhatsky, Steven Rostedt, Linus Torvalds

On (12/05/17 16:55), Petr Mladek wrote:
[..]
> 
> Am I too paranoid? Steven? Linus?

well, if there is a tool that believels that there will always be [ (part
of timestamp marker) as the first symbol then that tool has a problem. there
is no "guaranteed" format of console messages. one can print

	pr_err("message\n message\n message\n);

or one can compile the kernel with no CONFIG_PRINTK_TIME.

or one can disable printk_time via printk_time module param.

	-ss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-12-05 15:55         ` Petr Mladek
  2017-12-05 16:13           ` Sergey Senozhatsky
@ 2017-12-05 20:54           ` Steven Rostedt
  2017-12-06 13:54             ` Petr Mladek
  1 sibling, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2017-12-05 20:54 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Fengguang Wu, Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML,
	Sergey Senozhatsky, Linus Torvalds

On Tue, 5 Dec 2017 16:55:37 +0100
Petr Mladek <pmladek@suse.com> wrote:

> I would have agreed with this argument few weeks ago but I do not get
> it any longer. If there are tools depending on the message format
> and people enable the extended format, the tools might stop working.
> 
> You might argue that these people are shooting into their legs.
> But other people might argue that kernel changed ABI and broke
> userspace tools. Note that these tools do not have any chance
> to force kernel using the old format, they are consumers.
> 
> It might be especially annoying if these tools are part of some
> complex infrastructure that is used to access the boot logs
> and admins get blind. Then it might be hard for them to debug
> the problem, especially if they got this feature as part of
> a bigger system update.
> 
> Am I too paranoid? Steven? Linus?

Remember, an ABI is only broken if a tool breaks. If it proves to break
userspace, then it can be reverted. But if there is no tool that
depends on it, it's fine to change.

That said, I haven't had a chance to fully read this thread. Mostly
because of the Turkey holiday which put me behind in other areas, and
also I currently traveling, and will be taking PTO for the rest of the
week.

-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel CI: printk loglevels in kernel boot logs?
  2017-12-05 20:54           ` Steven Rostedt
@ 2017-12-06 13:54             ` Petr Mladek
  0 siblings, 0 replies; 20+ messages in thread
From: Petr Mladek @ 2017-12-06 13:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Fengguang Wu, Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML,
	Sergey Senozhatsky, Linus Torvalds

On Tue 2017-12-05 15:54:09, Steven Rostedt wrote:
> On Tue, 5 Dec 2017 16:55:37 +0100
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > I would have agreed with this argument few weeks ago but I do not get
> > it any longer. If there are tools depending on the message format
> > and people enable the extended format, the tools might stop working.
> > 
> > You might argue that these people are shooting into their legs.
> > But other people might argue that kernel changed ABI and broke
> > userspace tools. Note that these tools do not have any chance
> > to force kernel using the old format, they are consumers.
> > 
> > It might be especially annoying if these tools are part of some
> > complex infrastructure that is used to access the boot logs
> > and admins get blind. Then it might be hard for them to debug
> > the problem, especially if they got this feature as part of
> > a bigger system update.
> > 
> > Am I too paranoid? Steven? Linus?
> 
> Remember, an ABI is only broken if a tool breaks. If it proves to break
> userspace, then it can be reverted. But if there is no tool that
> depends on it, it's fine to change.
> 
> That said, I haven't had a chance to fully read this thread. Mostly
> because of the Turkey holiday which put me behind in other areas, and
> also I currently traveling, and will be taking PTO for the rest of the
> week.

OK, I feel relieved. I am not aware of any tool that depends on the
 format. I just could imagine that there might be some.

Otherwise, I agree that showing log levels on the console might be
useful. So I am going to review the RFC implementation, see
https://lkml.kernel.org/r/20171201104404.1885-1-sergey.senozhatsky@gmail.com

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-12-06 13:54 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAOi56cVORdyjTXK4QGYRfyD1Q=QBgsU3B__gZT0xj6OBKaasLQ@mail.gmail.com>
     [not found] ` <20171122015610.x3kgzqgtwywlurmz@wfg-t540p.sh.intel.com>
2017-11-22  3:27   ` kernel CI: printk loglevels in kernel boot logs? Fengguang Wu
2017-11-22  5:26     ` Sergey Senozhatsky
2017-11-22 10:42       ` Mark Brown
2017-11-22 11:34     ` Petr Mladek
2017-11-22 12:38       ` Sergey Senozhatsky
2017-11-22 12:52         ` Fengguang Wu
2017-11-23  2:59           ` Sergey Senozhatsky
2017-11-23  3:14             ` Fengguang Wu
2017-11-23  4:31               ` Sergey Senozhatsky
2017-11-29  0:13             ` Kevin Hilman
2017-11-29  7:25               ` Sergey Senozhatsky
2017-11-30 17:45                 ` Kevin Hilman
2017-12-01  1:25                   ` Sergey Senozhatsky
2017-11-23 10:04           ` Petr Mladek
2017-11-22 20:22         ` Kevin Hilman
2017-11-22 14:10       ` Fengguang Wu
2017-12-05 15:55         ` Petr Mladek
2017-12-05 16:13           ` Sergey Senozhatsky
2017-12-05 20:54           ` Steven Rostedt
2017-12-06 13:54             ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).