LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
@ 2011-01-25 13:47 Ahmed S. Darwish
  2011-01-25 13:51 ` [PATCH -next 1/2][RFC] x86: Saveoops: Switch to real-mode and call BIOS Ahmed S. Darwish
                   ` (4 more replies)
  0 siblings, 5 replies; 35+ messages in thread
From: Ahmed S. Darwish @ 2011-01-25 13:47 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, X86-ML
  Cc: Tony Luck, Dave Jones, Andrew Morton, Randy Dunlap,
	Willy Tarreau, Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML,
	LKML

Hi,

I've faced some very early panics in latest kernel. Being a run of the mill
x86 laptop, the machine is void of debugging aids like serial ports or
network boot.

As a possible solution, below patches prototypes the idea of persistently
storing the kernel log ring to a hard disk partition using the enhanced BIOS
0x13 services.

The used BIOS INT 0x13 functions are the same ones originally used by all
contemporary bootloaders to load the Linux kernel. If the kernel code is
already loaded to RAM and being executed, such parts of the BIOS should be
stable enough.

The basic idea is to switch from 64-bit long mode all the way down to 16-bit
real-mode. Once in real-mode, we reset the disk controller and write the log
buffer to disk using a user-supplied absolute disk block address (LBA).

Doing so, we can capture very early panics (along with earlier log messages)
reliably since the writing mechanism has minimal dependency on any Linux code.

Unfortunately, there are problems on some machines.

In my laptop, when calling the BIOS with the "Reset Disk Controllers" command
or even issuing a direct "Extend Write" without a controller reset, the BIOS
hangs for around __5 minutes__. Afterwards, it returns with a 'Timeout' error
code.

The main problem, it seems, is that the BIOS "Reset controller" command is not
enough to restore disk hardware to a state understandable by the BIOS code.

So:

 - Is it possible to re-initialize the disk hardware to its POST state (thus
   make the BIOS services work reliably) while keeping system RAM unmodified?
 - If not, can we do it manually by reprogramming the controllers?

The first patch (#1) implements the longMode -> realMode switch and invokes
the BIOS. The second reserves needed low-memory areas for such code and
registers a panic logger using the kmsg_dump interface.

Both patches are on '-next' and include XXX marks where further help is also
appreciated. Please remember that these patches, while tested, are now for
prototyping the technical feasibility of the idea.

Diffstat:

 arch/x86/kernel/saveoops-rmode.S |  483 ++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/saveoops.h  |   15 ++
 arch/x86/kernel/saveoops.c       |  219 +++++++++++++++++
 arch/x86/kernel/setup.c          |    9 +
 arch/x86/kernel/Makefile         |    3 +
 lib/Kconfig.debug                |   15 ++
 6 files changed, 744 insertions(+), 0 deletions(-)

Related work and discussions:

 - Tony Luck, persistent store: http://article.gmane.org/gmane.linux.kernel.cross-arch/8495
 - Dirk Hohndel, hpa, Japan Symposium, 2D barcode: http://video.linux.com/video/1661
 - akpm, Dave Jones, oops pauser: http://article.gmane.org/gmane.linux.kernel/369739
 - Willy Tarreau, Randy Dunlap, kmsgdump: http://www.xenotime.net/linux/kmsgdump/

Thanks,

--
Darwish
http://darwish.07.googlepages.com


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH -next 1/2][RFC] x86: Saveoops: Switch to real-mode and call BIOS
  2011-01-25 13:47 [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic Ahmed S. Darwish
@ 2011-01-25 13:51 ` Ahmed S. Darwish
  2011-01-25 17:26   ` H. Peter Anvin
  2011-01-25 13:53 ` [PATCH -next 2/2][RFC] x86: Saveoops: Reserve low memory and register code Ahmed S. Darwish
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 35+ messages in thread
From: Ahmed S. Darwish @ 2011-01-25 13:51 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, X86-ML
  Cc: Tony Luck, Dave Jones, Andrew Morton, Randy Dunlap,
	Willy Tarreau, Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML,
	LKML


We get called here upon panic()s to save the kernel log buffer.

First, switch from 64-bit long mode to 16-bit real mode. Afterwards, save the
log buffer to disk using extended INT 0x13 BIOS services. The user has given
us an absolute LBA disk address to save the log buffer to.

By x86 design, this code is mandated to run on a single identity-mapped page.

- How to initialize the disk hardware to its POST state (thus making the
  BIOS code work reliably) while keeping system RAM unmodified?

- Is it guaranteed that '0x80' will always be the boot disk drive number?
  If not, we need to be passed the boot drive number from the bootloader.

Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
---

 arch/x86/kernel/saveoops-rmode.S |  483 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 483 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/saveoops-rmode.S b/arch/x86/kernel/saveoops-rmode.S
new file mode 100644
index 0000000..6e07112
--- /dev/null
+++ b/arch/x86/kernel/saveoops-rmode.S
@@ -0,0 +1,483 @@
+/* PROTOTYPE - PROTOTYPE - PROTOTYPE - PROTOTYPE - PROTOTYPE - PROTOTYPE */
+
+/*
+ * Saveoops LongMode -> RealMode switch
+ *
+ * Don't come here with any unfinished business at hand, there's no return.
+ * After writing the log buffer to disk, we just halt.
+ */
+
+#include <linux/linkage.h>
+
+#include <asm/processor-flags.h>
+#include <asm/msr-index.h>
+#include <asm/pgtable_types.h>
+#include <asm/segment.h>
+#include <asm/saveoops.h>
+
+/*
+ * Notes:
+ * - Avoid using relocatable symbols: we run from a different place than
+ *   where we're originally linked to. Use absolute addresses
+ * - Run this from an identity page since we disable paging
+ * - Dynamic values are used for all x86 table bases to let this code run
+ *   from *any* memory region below 1-Mbyte
+ */
+	.code64
+ENTRY(saveoops_start)
+	/*
+	 * Switch to 32bit-compatibility mode using a L=0 code segment
+	 */
+
+	cli
+
+	/* Permanently store passed parameters */
+	movq	%rdi, %rbp
+	movl	%esi, (ringbuf_addr - saveoops_start)(%ebp)
+	movl	%edx, (rstack_base - saveoops_start)(%ebp)
+	movq	%rcx, (disk_sector - saveoops_start)(%ebp)
+	movl	%r8d, (ringbuf_len - saveoops_start)(%ebp)
+
+	/* Dynamically set the 32bit-compat. GDTR base */
+	leaq	(lmode32_gdt - saveoops_start)(%ebp), %rax
+	movq	%rax, (lmode32_gdt + 2 - saveoops_start)(%ebp)
+
+	/* Dynamically set the 32bit farpointer base */
+	leal	(compat32 - saveoops_start)(%ebp), %eax
+	movl	%eax, (lmode32_farpointer - saveoops_start)(%ebp)
+
+	lgdt	(lmode32_gdt - saveoops_start)(%ebp)
+	ljmpl	*(lmode32_farpointer - saveoops_start)(%ebp)	# addr32
+
+	.code32
+compat32:
+	/*
+	 * 32bit-compatibility Long Mode, using a L=0 %cs
+	 */
+
+	movw	$__KERNEL_DS, %ax
+	movw	%ax, %ds
+	movw	%ax, %es
+	movw	%ax, %ss
+
+	/* 'Deactivate' long mode: disable paging */
+	movl	%cr0, %eax
+	andl    $~X86_CR0_PG, %eax
+	movl    %eax, %cr0
+
+	/*
+	 * Prepare identity maps for the first 2Mbytes. PAE is already
+	 * enabled from the original pmode -> lmode transition.
+	 *
+	 * Reuse head.S page tables instead of creating new ones. Such
+	 * early tables are in fact already reused by the newer direct
+	 * mapping tables, but since paging is now disabled (and we're
+	 * not returning back), hopefully nothing will blow up.
+	 */
+
+	/*
+	 * Pick a table for the PAE Page Directory (PD)
+	 */
+
+	.equ	level2_pae_ident_pgt, (level2_ident_pgt - __START_KERNEL_map)
+	.equ	level2_entry_count, 512
+	.equ	level2_entry_len, 8
+
+	xorl	%eax, %eax
+	movl	$level2_pae_ident_pgt, %edi
+	movl    $((level2_entry_count * level2_entry_len) / 4), %ecx
+	rep	stosl
+
+	movl	$(0 + __PAGE_KERNEL_IDENT_LARGE_EXEC), level2_pae_ident_pgt
+
+	/*
+	 * Pick a table for for the PAE Page Directory Pointer (PDP)
+	 */
+
+	.equ	level3_pae_ident_pgt, (level2_spare_pgt - __START_KERNEL_map)
+	.equ	level3_entry_count, 4
+	.equ	level3_entry_len, 8
+
+	xorl	%eax, %eax
+	movl	$level3_pae_ident_pgt, %edi
+	movl    $((level3_entry_count * level3_entry_len) / 4), %ecx
+	rep	stosl
+
+	movl	$(level2_pae_ident_pgt + _PAGE_PRESENT), level3_pae_ident_pgt
+
+	movl	$level3_pae_ident_pgt, %eax
+	movl    %eax, %cr3
+
+	/* 'Disable' long mode: clear the EFER.LME bit */
+	movl	$MSR_EFER, %ecx
+	rdmsr
+	btcl	$_EFER_LME, %eax
+	wrmsr
+
+	/* Finally, move to 32-bit pmode: re-enabling paging */
+	movl	%cr0, %eax
+	orl     $X86_CR0_PG, %eax
+	movl    %eax, %cr0
+	jmp	pmode32			# flush prefetch
+
+pmode32:
+	/*
+	 * 32-bit protected mode, using a 2MB identity page.
+	 */
+
+	/* Paging was only enabled for the lmode->pmode step */
+	movl	%cr0, %eax
+	andl    $~X86_CR0_PG, %eax
+	movl    %eax, %cr0		# paging no more
+
+	xorl	%eax, %eax
+	movl	%eax, %cr3		# flush the TLB
+
+	/* Dynamically set the GDTR base value */
+	leal	(pmode16_gdt - saveoops_start)(%ebp), %eax
+	movl	%eax, (pmode16_gdt + 2 - saveoops_start)(%ebp)	# base[00:32]
+
+	/* Dynamically set %cs and %ds bases */
+	leal	(pmode16 - saveoops_start)(%ebp), %eax
+	movw	%ax, (pmode16_cs + 2 - saveoops_start)(%ebp)	# base[00:15]
+	movw	%ax, (pmode16_ds + 2 - saveoops_start)(%ebp)	# base[00:15]
+	shrl	$16, %eax
+	movb	%al, (pmode16_cs + 4 - saveoops_start)(%ebp)	# base[16:23]
+	movb	%al, (pmode16_ds + 4 - saveoops_start)(%ebp)	# base[16:23]
+
+	/* Load the 16-bit code and data segments */
+	lgdt	(pmode16_gdt - saveoops_start)(%ebp)
+
+	/* Switch to 16-bit pmode: use the setup 16-bit %cs */
+	ljmp	$0x08, $0x0
+
+	/*
+	 * - “Segment base addresses should be 16-byte aligned” --Intel
+	 * - We also use this as the rmode code base; the 16-byte align
+	 *   will make address caclulations much easier.
+	 */
+	.align 16
+	.globl pmode16
+	.code16
+pmode16:
+	/*
+	 * We're now in the 16-bit protected mode. Since PE is still = 1,
+	 * we can change a segment cache by loading a GDT selector value.
+	 */
+
+	movw	$0x10, %ax
+	movw	%ax, %ds
+	movw	%ax, %es
+	movw	%ax, %fs
+	movw	%ax, %gs
+	movw	%ax, %ss
+
+	/*
+	 * NOTE! Due to the new %cs and %ds bases, dereference addresses
+	 * using the from ‘label - pmode16’ from now on.
+	 */
+
+	/* Dynamically build an rmode segment and offset */
+	leal	(pmode16 - saveoops_start)(%ebp), %eax		# absolute value
+	shrl	$4, %eax
+	movw	%ax, rmode_farpointer - pmode16 + 2		# 8086 %cs
+	movw	$(rmode - pmode16), rmode_farpointer - pmode16	# offset
+
+	/* Restore real-mode BIOS interrupt entries */
+	lidt   (rmode_idtr - pmode16)
+
+	/* Switch to canonical real-mode: clear PE */
+	movl	%cr0, %eax
+	andl	$~X86_CR0_PE, %eax
+	movl	%eax, %cr0
+
+	/* Flush prefetch; use the 8086 code segment */
+	ljmp	*(rmode_farpointer - pmode16)
+
+#ifdef	SAVEOOPS_DEBUG
+	/*
+	 * Valid for any real-mode context where a stack exists
+	 */
+#define __print(msg)		;\
+	pushfl			;\
+	pushal			;\
+	pushw	$(1f - pmode16) ;\
+	call	print_string	;\
+	.ascii	"Saveoops: "	;\
+	.ascii	msg		;\
+	.asciz	"      \n\r"	;\
+1:	popal			;\
+	popfl
+#else
+#define __print(msg)		;
+#endif
+
+	.align 16
+rmode:
+	/*
+	 * REAL Mode, at last!
+	 *
+	 * For further details on the BIOS interrupts used, check any
+	 * version of the “Enhanced Disk Drive Specification”.
+	 */
+
+	movw	%cs, %ax
+	movw	%ax, %ds
+	movw	%ax, %es
+	movw	%ax, %fs
+	movw	%ax, %gs
+
+	/* Setup passed stack area */
+	movl	(rstack_base - pmode16), %eax
+	shrl	$4, %eax			# 16byte-aligned
+	movw	%ax, %ss
+	movw	$RMODE_STACK_LEN, %sp
+
+	__print	("Entered real mode")
+
+	/*
+	 * XXXX: We always use the boot disk drive number '0x80'. Can
+	 * this map to a wrong device?
+	 *
+	 * NOTE! Do not trust the BIOS: assume it clobbered all the
+	 * registers (relevant and not) while servicing interrupts.
+	 */
+
+	/*
+	 * Check Extensions Present (0x41) - Does the BIOS provide
+	 * EDD int 0x13 extensions?
+	 *
+	 * input  %bx     - 0x55aa
+	 * input  %dl     - drive number
+	 * output success - carry = 0 && bx = 0xaa55 && cx bit0 = 1
+	 * output failure - carry = 1 || any false condition above
+	 */
+	movb	$0x41, %ah
+	movw	$0x55aa, %bx
+	movb	$0x80, %dl
+	xorw	%cx, %cx
+	pushw	%ds
+	int	$0x13
+	popw	%ds
+	__print	("Queried BIOS for EDD services")
+	jc	no_edd1
+	cmpw	$0xaa55, %bx
+	jne	no_edd2
+	shrw	$1, %cx
+	jnc	no_edd3
+
+	/* Store 16byte-aligned ring buffer address in disk packet */
+	movl	(ringbuf_addr - pmode16), %eax
+	shrl	$4, %eax
+	movw	%ax, (buffer_seg - pmode16)
+	xorw	%ax, %ax
+	movw	%ax, (buffer_offset - pmode16)
+
+	/* Store ringbuf number of 512-byte blocks in disk packet */
+	movl	(ringbuf_len - pmode16), %eax
+	movb	%al, (sectors_cnt - pmode16)
+
+	__print	("Prepared the Disk Address Packet")
+
+	/*
+	 * Reset Hard Disks (0x00)
+	 *
+	 * input  %dl	  - drive number
+	 * output success - carry = 0 && %ah (err code) = 0
+	 * output failure - carry = 1 || %ah = error code
+	 *
+	 * The kernel has just paniced and left the disk controller
+	 * in an unknown state. Reset controllers before write.
+	 */
+	xorw	%ax, %ax
+	movb	$0x80, %dl
+	pushw	%ds
+	int	$0x13
+	popw	%ds
+	__print	("Disk controller reset")
+	jc	init_err1
+	cmpb	$0x0, %ah
+	jne	init_err2
+
+	/*
+	 * Extended Write (0x43) - Transfer data from RAM to disk
+	 *
+	 * input  %al     - 0 (write with verify off)
+	 * input  %dl     - drive number
+	 * input  %ds:si  - pointer to the Disk Address Packet
+	 * output success - carry = 0 && %ah (err code) = 0
+	 * output failure - carry = 1 || %ah = error code
+	 */
+	movb	$0x43, %ah
+	xorb	%al, %al
+	movb	$0x80, %dl
+	movw	$(disk_address_packet - pmode16), %si
+	pushw	%ds
+	int	$0x13
+	popw	%ds
+	__print	("Extended write finished")
+	jc	write_err1
+	cmpb	$0x0, %ah
+	jne	write_err2
+	jmp	success
+
+init_err1:
+	__print ("INT 0x13/0x0 init error 1")
+	jmp	print_errcode
+init_err2:
+	__print ("INT 0x13/0x0 init error 2")
+	jmp	print_errcode
+write_err1:
+	__print	("INT 0x13/0x43 write error 1")
+	jmp	print_errcode
+write_err2:
+	__print	("INT 0x13/0x43 write error 2")
+	jmp	print_errcode
+no_edd1:
+	__print	("Bios does not support EDD service (err=1)")
+	jmp	print_errcode
+no_edd2:
+	__print	("Bios does not support EDD service (err=2)")
+	jmp	print_errcode
+no_edd3:
+	__print	("Bios does not support EDD service (err=3)")
+	jmp	print_errcode
+success:
+	__print	("Sucess!!!")
+	jmp	print_errcode
+
+halt:	hlt
+	jmp	halt
+
+#ifdef	SAVEOOPS_DEBUG
+	/*
+	 * Print Null-terminated string pointed by top of the stack
+	 */
+	.type	print_string, @function
+print_string:
+	popw	%si
+1:	xorb	%bh, %bh
+	movb	$0x0e, %ah
+	lodsb
+	cmpb	$0, %al
+	je	2f
+	int	$0x10
+	jmp	1b
+2:	ret
+
+	/*
+	 * print %dx value in hexadecimal ascii
+	 */
+	.type	print_hex, @function
+print_hex:
+	xorb   %bh, %bh
+	movw   $4, %cx			# 2-bytes = 4 hex digits
+print_digit:
+	rolw   $4, %dx			# highest-order 4 bits in front
+	movw   $0x0e0f, %ax		# bios function 0x0e
+	andb   %dl, %al
+	cmpb   $0x0a, %al		# transform to ASCII
+	jl     digit
+	addb   $0x07, %al
+digit:
+	addb   $0x30, %al
+	int    $0x10
+	loop   print_digit
+	ret
+
+	/*
+	 * Print INT13 err code, number of sectors written
+	 */
+print_errcode:
+	movb	%ah, %dl
+	call	print_hex
+	movw	(sectors_cnt - pmode16), %dx
+	call	print_hex
+	jmp	halt
+#else
+print_errcode:
+	jmp	halt
+#endif
+
+
+/*
+ * Virtual data section; ‘(dyn.)’ = A dynamically-set value
+ */
+
+	.align 16
+lmode32_gdt:
+	.word	lmode32_gdt_end - lmode32_gdt - 1
+	.quad	0x0000000000000000	# base (dyn.)
+	.word	0, 0, 0			# padding
+lmode32_cs:
+	.word	0xffff			# limit
+	.word	0x0000			# base
+	.word	0x9a00			# P=1, C=0, type=0xA (r/x)
+	.word   0x00cf			# L=0 (compat.), D=1 (32-bit), G=1
+lmode32_ds:
+	.word	0xffff			# limit
+	.word	0x0000			# base
+	.word	0x9200			# P=1, type=0x2 (r/w)
+	.word	0x00cf			# G=1, D=1 (32-bit)
+lmode32_gdt_end:
+
+lmode32_farpointer:
+	.long	0x00000000		# offset (dyn.)
+	.word	lmode32_cs -lmode32_gdt # %cs selector
+
+	.align 16
+pmode16_gdt:
+	.word	pmode16_gdt_end - pmode16_gdt - 1
+	.long	0x00000000		# base (dyn.)
+	.word	0x0000			# padding
+pmode16_cs:
+	.word	0xffff			# limit
+	.word	0x0000			# base (dyn.)
+	.word	0x9a00			# P=1, DPL=00, type=0xA (execute/read)
+	.word	0x0000			# G=0 (byte), D=0 (16-bit)
+pmode16_ds:
+	.word	0xffff			# limit
+	.word	0x0000			# base (dyn.)
+	.word	0x9200			# P=1, DPL=00, type=0x2 (read/write)
+	.word	0x0000			# G=0 (byte), D=0 (16-bit)
+pmode16_gdt_end:
+
+rmode_farpointer:
+	.word	0x0000			# offset (dyn.)
+	.word	0x0000			# %cs (dyn.)
+
+rmode_idtr:
+	.equ	RIDT_BASE, 0x0		# PC architecture defined
+	.equ	RIDT_ENTRY_SIZE, 0x4	# 8086 defined
+	.equ	RIDT_ENTRIES, 0x100	# 8086, 286, 386+ defined
+	.word	RIDT_ENTRIES * RIDT_ENTRY_SIZE - 1
+	.long	RIDT_BASE
+
+	/* Values passed by long-mode C code */
+ringbuf_addr:
+	.long	0x00000000		# 16-byte aligned, < 1-MB (dyn.)
+ringbuf_len:
+	.long	0x00000000		# 512-byte aligned (dyn.)
+rstack_base:
+	.long	0x00000000		# 16-byte aligned, < 1-MB (dyn.)
+
+	.align 16
+disk_address_packet:			# for extended INT 0x13 services (dyn.)
+packet_size:
+	.byte	0x10			# in bytes
+reserved0:
+	.byte	0x00			# must be zero
+sectors_cnt:
+	.byte	0x00			# number of blocks to transfer [1 - 127]
+reserved1:
+	.byte	0x00			# must be zero
+buffer_offset:
+	.word	0x0000			# read/write buffer offset
+buffer_seg:
+	.word	0x0000			# read/write buffer segment
+disk_sector:
+	.quad	0x0000000000000000	# logical sector number (LBA)
+
+ENTRY(saveoops_end)
+
+/* PROTOTYPE - PROTOTYPE - PROTOTYPE - PROTOTYPE - PROTOTYPE - PROTOTYPE */

--
Darwish
http://darwish.07.googlepages.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH -next 2/2][RFC] x86: Saveoops: Reserve low memory and register code
  2011-01-25 13:47 [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic Ahmed S. Darwish
  2011-01-25 13:51 ` [PATCH -next 1/2][RFC] x86: Saveoops: Switch to real-mode and call BIOS Ahmed S. Darwish
@ 2011-01-25 13:53 ` Ahmed S. Darwish
  2011-01-25 17:29   ` H. Peter Anvin
  2011-01-25 14:09 ` [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic Ingo Molnar
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 35+ messages in thread
From: Ahmed S. Darwish @ 2011-01-25 13:53 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, X86-ML
  Cc: Tony Luck, Dave Jones, Andrew Morton, Randy Dunlap,
	Willy Tarreau, Willy Tarreau, Dirk Hohndel, Dirk.Hohndel,
	Simon Kagstrom, IDE-ML, LKML


Using the x86 memblock interface, reserve below 1-Mbyte low memory areas
for the Saveoops LongMode -> RealMode switch code, ring buffer, and stack.
All the low memory areas are dynamically allocated and reserved, giving
memblock enough flexibility to choose the best available areas possible.

To trigger Saveoops on panic(), it's registered using the kmsg_dump hooks.
That interface is quite racy for our goals, but it's quickly used now to
prototype the code (check the XXX mark for details.)

Once Saveoops code is triggered, it identity maps the first 2 MBytes (the
switch code disables paging), copy the log buffer to its reserved 8086-
accessible area, and jumps to the switch code (PATCH #1.)

Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
---

 arch/x86/kernel/saveoops.c      |  219 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/setup.c         |    9 ++
 arch/x86/include/asm/saveoops.h |   15 +++
 arch/x86/kernel/Makefile        |    3 +
 lib/Kconfig.debug               |   15 +++
 5 files changed, 261 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/saveoops.c b/arch/x86/kernel/saveoops.c
new file mode 100644
index 0000000..f48fc0a
--- /dev/null
+++ b/arch/x86/kernel/saveoops.c
@@ -0,0 +1,219 @@
+/* PROTOTYPE - PROTOTYPE - PROTOTYPE - PROTOTYPE - PROTOTYPE - PROTOTYPE */
+
+/*
+ * SAVEOOPS -- Save kernel log buffer to disk upon panic()
+ *
+ * To safely access disk in situations like very early boot or where the
+ * disk access code itself is buggy, we use BIOS INT13h extended services.
+ * To access such services, switch to 8086 real-mode first.
+ */
+
+#include <linux/kernel.h>
+#include <linux/compiler.h>
+#include <linux/log2.h>
+#include <linux/time.h>
+#include <linux/kmsg_dump.h>
+#include <linux/memblock.h>
+#include <linux/sched.h>
+
+#include <asm/page.h>
+#include <asm/pgtable.h>
+#include <asm/tlbflush.h>
+#include <asm/saveoops.h>
+
+/*
+ * We can only access the first MByte in real mode, thus allocate
+ * low-memory areas for the ring buffer, and rmode code and stack.
+ */
+static phys_addr_t ring_buf;
+static phys_addr_t code_buf;
+static phys_addr_t rmode_stack;
+
+/*
+ * Below 1-Mbyte pointer to lmode->rmode switch code.
+ */
+static void (* __noreturn rmode_switch)(phys_addr_t code_buf,
+					phys_addr_t ring_buf,
+					phys_addr_t rmode_stack,
+					uint64_t disk_lba,
+					uint64_t ring_buf_len);
+
+/*
+ * Absolute LBA address where the log will be saved on disk.
+ */
+static uint64_t disk_lba = CONFIG_SAVEOOPS_DISK_LBA;
+
+/*
+ * Extended BIOS services write to disk in units of 512-byte sectors.
+ * Thus, always align the ring buffer size on a 512-byte boundary.
+ */
+#define RMODE_SEGMENT_LIMIT	0x10000UL
+#define RING_SIZE		(60UL * 1024)
+#define SAVEOOPS_HEADER		"*SAVEOOPS-WRITTEN KERNEL LOG*"
+
+/*
+ * Page tables to identity map the first 2 Mbytes.
+ */
+static __aligned(PAGE_SIZE) pud_t ident_level3[PTRS_PER_PUD];
+static __aligned(PAGE_SIZE) pmd_t ident_level2[PTRS_PER_PMD];
+
+/*
+ * The lmode->rmode switching code needs to run from an identity page
+ * since it disables paging.
+ */
+static void build_identity_mappings(void)
+{
+	pgd_t *pgde;
+	pud_t *pude;
+	pmd_t *pmde;
+
+	pmde = ident_level2;
+	set_pmd(pmde, __pmd(0 + __PAGE_KERNEL_IDENT_LARGE_EXEC));
+
+	pude = ident_level3;
+	set_pud(pude, __pud(__pa(ident_level2) + _KERNPG_TABLE));
+
+	pgde = init_level4_pgt;
+	set_pgd(pgde, __pgd(__pa(ident_level3) + _KERNPG_TABLE));
+
+	__flush_tlb_all();
+}
+
+/*
+ * XXX: Our use of kmsg_dump interface is invalid. We completely halt the
+ *	machine when getting called; this means:
+ *	- other registered loggers won't have a chance to read the ring
+ *	- other CPU cores might also be accessing the disk, racing with
+ *	  BIOS code that will do the same.
+ *
+ *	Such interface is now used to get things going. A new interface
+ *	satisfying our special requirements needs to be created. A
+ *	solution is to do an rmode->lmode switch after writing to disk.
+ */
+static void saveoops_do_dump(struct kmsg_dumper *dumper,
+			     enum kmsg_dump_reason reason,
+			     const char *s1, unsigned long l1,
+			     const char *s2, unsigned long l2)
+{
+	unsigned long l1_cpy, l2_cpy, s1_start, s2_start;
+	struct timeval timestamp;
+	char *buf, *buf_orig;
+	int hdr_size;
+
+	if (reason != KMSG_DUMP_PANIC)
+		return;
+
+	do_gettimeofday(&timestamp);
+
+	buf = __va(ring_buf);
+	buf_orig = buf;
+	memset(buf, '\0', RING_SIZE);
+	buf += sprintf(buf, "%s\n", SAVEOOPS_HEADER);
+	buf += sprintf(buf, "%lu.%lu\n", timestamp.tv_sec, timestamp.tv_usec);
+
+	hdr_size = buf - buf_orig;
+	l2_cpy = min(l2, RING_SIZE - hdr_size);
+	l1_cpy = min(l1, RING_SIZE - hdr_size - l2_cpy);
+
+	s2_start = l2 - l2_cpy;
+	s1_start = l1 - l1_cpy;
+	memcpy(buf, s1 + s1_start, l1_cpy);
+	memcpy(buf + l1_cpy, s2 + s2_start, l2_cpy);
+
+	printk(KERN_EMERG "Saveoops: Saving kernel log to boot disk LBA "
+	       "address %llu\n", disk_lba);
+
+	local_irq_disable();
+	build_identity_mappings();
+	rmode_switch(code_buf, ring_buf, rmode_stack, disk_lba, RING_SIZE >> 9);
+}
+
+static struct kmsg_dumper saveoops_dumper = {
+	.dump = saveoops_do_dump,
+};
+
+/*
+ * Real-mode switch code start and end markers.
+ * @pmode16: 16-bit protected mode entry point; 8086-segments base.
+ */
+extern const char saveoops_start[];
+extern const char saveoops_end[];
+extern const char pmode16[];
+
+/*
+ * Simplify real mode segmented-addressing calculations
+ */
+#define RMODE_DATA_ALIGN	16
+
+void __init saveoops_init(void)
+{
+	unsigned int code_size, code_align;
+	int res;
+
+	if (disk_lba == -1) {
+		printk(KERN_INFO "Saveoops: No disk LBA given; will not save "
+		       "kernel log to disk upon panic.\n");
+		return;
+	}
+
+	BUILD_BUG_ON(!IS_ALIGNED(RING_SIZE, 512));
+	BUILD_BUG_ON(RING_SIZE > RMODE_SEGMENT_LIMIT);
+	BUILD_BUG_ON(RMODE_STACK_LEN > RMODE_SEGMENT_LIMIT);
+	BUG_ON((saveoops_end - pmode16) > RMODE_SEGMENT_LIMIT);
+
+	ring_buf = memblock_find_in_range(0, 1<<20, RING_SIZE, RMODE_DATA_ALIGN);
+	if (ring_buf == MEMBLOCK_ERROR) {
+		printk(KERN_ERR "Saveoops: requesting a low-memory region "
+		       "for ring buffer failed\n");
+		return;
+	}
+	memblock_x86_reserve_range(ring_buf, ring_buf + RING_SIZE,
+				   "SAVEOOPS ringbuf");
+	printk(KERN_INFO "Saveoops: Acquired [0x%llx-0x%llx] for the ring "
+	       "buffer\n", ring_buf, ring_buf + RING_SIZE);
+
+	/* The pmode->rmode switch code “MUST” be in a single page */
+	code_size = saveoops_end - saveoops_start;
+	code_align = roundup_pow_of_two(code_size);
+	code_buf = memblock_find_in_range(0, 1<<20, code_size, code_align);
+	if (code_buf == MEMBLOCK_ERROR) {
+		printk(KERN_ERR "Saveoops: requesting a low-memory region "
+		       "for mode-switching code failed\n");
+		goto fail3;
+	}
+	memblock_x86_reserve_range(code_buf, code_buf + code_size,
+				   "SAVEOOPS codebuf");
+	printk(KERN_INFO "Saveoops: Acquired [0x%llx-0x%llx] for rmode-switch "
+	       "code\n", code_buf, code_buf + code_size);
+
+	rmode_stack = memblock_find_in_range(0, 1<<20, RMODE_STACK_LEN,
+					     RMODE_DATA_ALIGN);
+	if (rmode_stack == MEMBLOCK_ERROR) {
+		printk(KERN_ERR "Saveoops: requesting a low-memory region "
+		       "for real-mode stack failed\n");
+		goto fail2;
+	}
+	memblock_x86_reserve_range(rmode_stack, rmode_stack + RMODE_STACK_LEN,
+				   "SAVEOOPS r-stack");
+	printk(KERN_INFO "Saveoops: Acquired [0x%llx-0x%llx] for rmode stack\n",
+	       rmode_stack, rmode_stack + RMODE_STACK_LEN);
+
+	res = kmsg_dump_register(&saveoops_dumper);
+	if (res) {
+		printk(KERN_ERR "Saveoops: registering kmsg dumper failed");
+		goto fail1;
+	}
+
+	memcpy(__va(code_buf), saveoops_start, code_size);
+	rmode_switch = (void *)code_buf;
+	return;
+
+fail1:
+	memblock_x86_free_range(rmode_stack, rmode_stack + RMODE_STACK_LEN);
+fail2:
+	memblock_x86_free_range(code_buf, code_buf + code_size);
+fail3:
+	memblock_x86_free_range(ring_buf, ring_buf + RING_SIZE);
+}
+
+/* PROTOTYPE - PROTOTYPE - PROTOTYPE - PROTOTYPE - PROTOTYPE - PROTOTYPE */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d3cfe26..3686df8 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -50,6 +50,9 @@
 #include <asm/pci-direct.h>
 #include <linux/init_ohci1394_dma.h>
 #include <linux/kvm_para.h>
+#ifdef CONFIG_SAVEOOPS
+#include <asm/saveoops.h>
+#endif
 
 #include <linux/errno.h>
 #include <linux/kernel.h>
@@ -925,6 +928,12 @@ void __init setup_arch(char **cmdline_p)
 	memblock.current_limit = get_max_mapped();
 	memblock_x86_fill();
 
+#ifdef CONFIG_SAVEOOPS
+	/* Initialize Saveoops at the earliest point possible: memblock
+	 * find_in_range is used here to reserve low-memory areas */
+	saveoops_init();
+#endif
+
 	/* preallocate 4k for mptable mpc */
 	early_reserve_e820_mpc_new();

diff --git a/arch/x86/include/asm/saveoops.h b/arch/x86/include/asm/saveoops.h
new file mode 100644
index 0000000..d81e840
--- /dev/null
+++ b/arch/x86/include/asm/saveoops.h
@@ -0,0 +1,15 @@
+#ifndef _SAVEOOPS_H
+#define _SAVEOOPS_H
+
+/*
+ * Definitions shared between Saveoops C and assembly code.
+ */
+
+#define RMODE_STACK_LEN		0x1000	/* Arbitrary */
+
+#ifndef __ASSEMBLY__
+
+void __init saveoops_init(void);
+
+#endif /* !__ASSEMBLY__ */
+#endif /* _SAVEOOPS_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 34244b2..9a097f2 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -121,4 +121,7 @@ ifeq ($(CONFIG_X86_64),y)
 
 	obj-$(CONFIG_PCI_MMCONFIG)	+= mmconf-fam10h_64.o
 	obj-y				+= vsmp_64.o
+
+	obj-$(CONFIG_SAVEOOPS)		+= saveoops.o
+	obj-$(CONFIG_SAVEOOPS)		+= saveoops-rmode.o
 endif
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4a78f8c..b994791 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -231,6 +231,21 @@ config BOOTPARAM_HUNG_TASK_PANIC
 
 	  Say N if unsure.
 
+config SAVEOOPS
+	bool "Save kernel panics to disk using BIOS"
+	depends on X86_64
+	---help---
+	  <TO-BE-ADDED>
+
+config SAVEOOPS_DISK_LBA
+       int "Boot disk LBA offset to save panic to"
+       default -1
+       depends on SAVEOOPS
+       ---help---
+	 Use this boot disk LBA address to save the kernel log.
+	 To find a partition LBA address use: $fdisk -ul
+	 [VERY DANGEROUS] <FURTHER-INFO-TO-BE-ADDED>
+
 config BOOTPARAM_HUNG_TASK_PANIC_VALUE
 	int
 	depends on DETECT_HUNG_TASK

--
Darwish
http://darwish.07.googlepages.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 13:47 [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic Ahmed S. Darwish
  2011-01-25 13:51 ` [PATCH -next 1/2][RFC] x86: Saveoops: Switch to real-mode and call BIOS Ahmed S. Darwish
  2011-01-25 13:53 ` [PATCH -next 2/2][RFC] x86: Saveoops: Reserve low memory and register code Ahmed S. Darwish
@ 2011-01-25 14:09 ` Ingo Molnar
  2011-01-25 15:08   ` Tejun Heo
  2011-01-25 15:36   ` Ahmed S. Darwish
  2011-01-25 14:49 ` Tejun Heo
  2011-01-25 20:25 ` Linus Torvalds
  4 siblings, 2 replies; 35+ messages in thread
From: Ingo Molnar @ 2011-01-25 14:09 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, X86-ML, Tony Luck,
	Dave Jones, Andrew Morton, Randy Dunlap, Willy Tarreau,
	Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML, LKML,
	Linus Torvalds, Peter Zijlstra, Frédéric Weisbecker,
	Borislav Petkov, Arjan van de Ven


* Ahmed S. Darwish <darwish.07@gmail.com> wrote:

> Hi,
> 
> I've faced some very early panics in latest kernel. Being a run of the mill
> x86 laptop, the machine is void of debugging aids like serial ports or
> network boot.
> 
> As a possible solution, below patches prototypes the idea of persistently
> storing the kernel log ring to a hard disk partition using the enhanced BIOS
> 0x13 services.
> 
> The used BIOS INT 0x13 functions are the same ones originally used by all
> contemporary bootloaders to load the Linux kernel. If the kernel code is
> already loaded to RAM and being executed, such parts of the BIOS should be
> stable enough.
> 
> The basic idea is to switch from 64-bit long mode all the way down to 16-bit
> real-mode. Once in real-mode, we reset the disk controller and write the log
> buffer to disk using a user-supplied absolute disk block address (LBA).
> 
> Doing so, we can capture very early panics (along with earlier log messages)
> reliably since the writing mechanism has minimal dependency on any Linux code.
> 
> Unfortunately, there are problems on some machines.
> 
> In my laptop, when calling the BIOS with the "Reset Disk Controllers" command
> or even issuing a direct "Extend Write" without a controller reset, the BIOS
> hangs for around __5 minutes__. Afterwards, it returns with a 'Timeout' error
> code.
> 
> The main problem, it seems, is that the BIOS "Reset controller" command is not
> enough to restore disk hardware to a state understandable by the BIOS code.
> 
> So:
> 
>  - Is it possible to re-initialize the disk hardware to its POST state (thus
>    make the BIOS services work reliably) while keeping system RAM unmodified?
>  - If not, can we do it manually by reprogramming the controllers?
> 
> The first patch (#1) implements the longMode -> realMode switch and invokes
> the BIOS. The second reserves needed low-memory areas for such code and
> registers a panic logger using the kmsg_dump interface.
> 
> Both patches are on '-next' and include XXX marks where further help is also
> appreciated. Please remember that these patches, while tested, are now for
> prototyping the technical feasibility of the idea.
> 
> Diffstat:
> 
>  arch/x86/kernel/saveoops-rmode.S |  483 ++++++++++++++++++++++++++++++++++++++
>  arch/x86/include/asm/saveoops.h  |   15 ++
>  arch/x86/kernel/saveoops.c       |  219 +++++++++++++++++
>  arch/x86/kernel/setup.c          |    9 +
>  arch/x86/kernel/Makefile         |    3 +
>  lib/Kconfig.debug                |   15 ++
>  6 files changed, 744 insertions(+), 0 deletions(-)

Ok, i have to admit that while i'm a rabid BIOS-hater i find this debug feature very 
very interesting, for the plain reason that if it's implemented in a robust and 
clever way then this has a chance to improve debuggability of pretty much any Linux 
laptop quite enormously!

While we generally thoroughly hate BIOSes from beginning to end, one thing can be 
said, a BIOS bootstraps very early during bootup, and it's relatively simple to 
trigger as well.

Also, since latest kernels do not stomp on BIOS data structures anymore (low RAM), 
there's some good chance it's still functional at the point of crash - be that an 
early crash or a later crash.

I think the biggest areas of practical concern would be:

 - Can this mechanism ever, under any circumstance corrupt any real data, destroy 
   the MBR or do other nasties. Can you think of any additional fail-safe measures 
   where you could _further robustify the BIOS calls_ to make sure it can never go 
   to the wrong sector(s)? I really do not want to think of trusting a BIOS to 
   _write to my disk_.

 - Is there some hidden disk area somewhere on PCs, or somewhere on a real partition
   on typical Linux distributions, which we could use without having to reinstall
   the box? This would increase utility and availability enormously. I'm thinking of 
   partition _ends_ which sometimes get rounded in an awkward way and which are 
   potentially skipped by most Linux filesystems. Even a very small, 512 bytes of 
   area would be extremely useful for debugging weird suspend hangs ...

 - Could we automate the recovery of the dump, and just put it into the regular 
   kernel log on the next (successful) bootup (on a feature-enabled kernel)? That 
   would make the log of the 'previous crash' very conveniently available in dmesg 
   and the syslog. Tools like kerneloops could make use of it immediately.

All in one, a very intriguing idea IMO, and the hardest bits (lowlevel x86 
transition) is all implemented already.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 13:47 [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic Ahmed S. Darwish
                   ` (2 preceding siblings ...)
  2011-01-25 14:09 ` [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic Ingo Molnar
@ 2011-01-25 14:49 ` Tejun Heo
  2011-01-28  7:59   ` Jan Ceuleers
  2011-01-25 20:25 ` Linus Torvalds
  4 siblings, 1 reply; 35+ messages in thread
From: Tejun Heo @ 2011-01-25 14:49 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, X86-ML, Tony Luck,
	Dave Jones, Andrew Morton, Randy Dunlap, Willy Tarreau,
	Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML, LKML

Hello,

On Tue, Jan 25, 2011 at 03:47:48PM +0200, Ahmed S. Darwish wrote:
> The main problem, it seems, is that the BIOS "Reset controller" command is not
> enough to restore disk hardware to a state understandable by the BIOS code.

I doubt many BIOSen would implement this properly.  It's something no
one ever uses and modern controllers have a lot more states and are
more complex to reset.

>  - Is it possible to re-initialize the disk hardware to its POST
>    state (thus make the BIOS services work reliably) while keeping
>    system RAM unmodified?

I'm afraid this can't be made reliable.  Nobody uses it and the stuff
we do during pci initialization is enough to leave some BIOSen
clueless.

>  - If not, can we do it manually by reprogramming the controllers?

It would be only theoretically possible.  We'll basically have to
write deinitialization routine for different controllers, which of
course would be super-cold path and not many people would test.

I'm afraid this is gonna be something which works sometimes (or even
more times than not) but can't ever be made reliable.  I think it
would be better to head toward usb or other kind of early console.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 14:09 ` [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic Ingo Molnar
@ 2011-01-25 15:08   ` Tejun Heo
  2011-01-25 17:33     ` H. Peter Anvin
  2011-02-03 14:36     ` Pavel Machek
  2011-01-25 15:36   ` Ahmed S. Darwish
  1 sibling, 2 replies; 35+ messages in thread
From: Tejun Heo @ 2011-01-25 15:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ahmed S. Darwish, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	X86-ML, Tony Luck, Dave Jones, Andrew Morton, Randy Dunlap,
	Willy Tarreau, Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML,
	LKML, Linus Torvalds, Peter Zijlstra,
	Frédéric Weisbecker, Borislav Petkov, Arjan van de Ven

Hello, Ingo.

On Tue, Jan 25, 2011 at 03:09:48PM +0100, Ingo Molnar wrote:
> I think the biggest areas of practical concern would be:
> 
>  - Can this mechanism ever, under any circumstance corrupt any real
>    data, destroy the MBR or do other nasties. Can you think of any
>    additional fail-safe measures where you could _further robustify
>    the BIOS calls_ to make sure it can never go to the wrong
>    sector(s)? I really do not want to think of trusting a BIOS to
>    _write to my disk_.

It's quite unlikely but I wouldn't say it's completely impossible.
It's common for ATA controllers to have dual modes of operation - the
old IDE compatible interface by emulation which is used by BIOS and
older operating systems and newer interface (ahci) to be used by
modern OS.  Some need to be explicitly switched and some just need to
be accessed carefully.  If the controller is accessed by bIOS after
switched to ahci or commands are in progress via ahci, anything can
happen.

There's also HPA which changes the size of the device which we often
unlock.  As it's always at the end, it usually shouldn't cause
problems but there are BIOSen which expect certain things near the end
of the device (BIOS RAID ones).  I'm fairly sure nobody is testing
BIOSen for cases where the size of underlying device changes without
going through POST.

To summarize, it's unlikely but at the same time there are some
_truly_ scary beasts out in the wild.  There's certain level of
danger.

>  - Is there some hidden disk area somewhere on PCs, or somewhere on
>    a real partition on typical Linux distributions, which we could
>    use without having to reinstall the box? This would increase
>    utility and availability enormously. I'm thinking of partition
>    _ends_ which sometimes get rounded in an awkward way and which
>    are potentially skipped by most Linux filesystems. Even a very
>    small, 512 bytes of area would be extremely useful for debugging
>    weird suspend hangs ...

There are holes but writing to them without full knowledge of the
configuration can be quite dangerous.  I don't think it would be
possible to mass deploy it without manual configuration unless we
specifically reserve (and maybe mark) it in the filesystem.

>  - Could we automate the recovery of the dump, and just put it into
>    the regular kernel log on the next (successful) bootup (on a
>    feature-enabled kernel)? That would make the log of the 'previous
>    crash' very conveniently available in dmesg and the syslog. Tools
>    like kerneloops could make use of it immediately.

Yeah, that would be actually quite nice.

> All in one, a very intriguing idea IMO, and the hardest bits
> (lowlevel x86 transition) is all implemented already.

I like the prospect too but am a bit skeptical whether this can be
made reliable enough to be a convenient tool.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 14:09 ` [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic Ingo Molnar
  2011-01-25 15:08   ` Tejun Heo
@ 2011-01-25 15:36   ` Ahmed S. Darwish
  2011-01-25 16:02     ` James Bottomley
  2011-01-25 17:32     ` Tony Luck
  1 sibling, 2 replies; 35+ messages in thread
From: Ahmed S. Darwish @ 2011-01-25 15:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, X86-ML, Tony Luck,
	Dave Jones, Andrew Morton, Randy Dunlap, Willy Tarreau,
	Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML, LKML,
	Linus Torvalds, Peter Zijlstra, Frédéric Weisbecker,
	Borislav Petkov, Arjan van de Ven

On Tue, Jan 25, 2011 at 03:09:48PM +0100, Ingo Molnar wrote:
> 
> * Ahmed S. Darwish <darwish.07@gmail.com> wrote:
> 
> > Hi,
> > 
> > I've faced some very early panics in latest kernel. Being a run of the mill
> > x86 laptop, the machine is void of debugging aids like serial ports or
> > network boot.
> > 
> > As a possible solution, below patches prototypes the idea of persistently
> > storing the kernel log ring to a hard disk partition using the enhanced BIOS
> > 0x13 services.

[...]

> > 
> > Diffstat:
> > 
> >  arch/x86/kernel/saveoops-rmode.S |  483 ++++++++++++++++++++++++++++++++++++++
> >  arch/x86/include/asm/saveoops.h  |   15 ++
> >  arch/x86/kernel/saveoops.c       |  219 +++++++++++++++++
> >  arch/x86/kernel/setup.c          |    9 +
> >  arch/x86/kernel/Makefile         |    3 +
> >  lib/Kconfig.debug                |   15 ++
> >  6 files changed, 744 insertions(+), 0 deletions(-)
> 
> Ok, i have to admit that while i'm a rabid BIOS-hater i find this debug feature very 
> very interesting, for the plain reason that if it's implemented in a robust and 
> clever way then this has a chance to improve debuggability of pretty much any Linux 
> laptop quite enormously!
> 
> While we generally thoroughly hate BIOSes from beginning to end, one thing can be 
> said, a BIOS bootstraps very early during bootup, and it's relatively simple to 
> trigger as well.
> 

Yes, the BIOS subset used is the same one used by SYSLINUX, grub, etc to load
the kernel. If the kernel is on, these functions are hopefully quite stable.

The complete __roadblock__ I'm currently facing though is restoring the disk
controllers to the state originally setup by the BIOS Power-on self-test (POST).
I hope such re-initialization is even technically feasible.

Without such re-initialization, we'll just be risking the BIOS code exploding.
That was the case in the 5-minute hang described in the cover sheet (PATCH #0).

> Also, since latest kernels do not stomp on BIOS data structures anymore (low RAM), 
> there's some good chance it's still functional at the point of crash - be that an 
> early crash or a later crash.
> 

Yes, I've noticed that thankfully the kernel reserves the BIOS EBDA area. I'm
not sure though if it reserves the real-mode vector table area 0x0 -> 0x400?

> I think the biggest areas of practical concern would be:
> 
>  - Can this mechanism ever, under any circumstance corrupt any real data, destroy 
>    the MBR or do other nasties. Can you think of any additional fail-safe measures 
>    where you could _further robustify the BIOS calls_ to make sure it can never go 
>    to the wrong sector(s)? I really do not want to think of trusting a BIOS to 
>    _write to my disk_.
> 

Admittedly, I was quite worried while testing this on disk. As a possible
safety trigger, I'm thinking of coding a small script under "tools/" that will
write a cookie in the desired area. If the real-mode code did not find such
cookie in the expected place, it abandons the write operation.

>  - Is there some hidden disk area somewhere on PCs, or somewhere on a real partition
>    on typical Linux distributions, which we could use without having to reinstall
>    the box? This would increase utility and availability enormously. I'm thinking of 
>    partition _ends_ which sometimes get rounded in an awkward way and which are 
>    potentially skipped by most Linux filesystems. Even a very small, 512 bytes of 
>    area would be extremely useful for debugging weird suspend hangs ...
> 

I did not have to re-partition the box here. A kindof a hacky solution was
disabling the swap partition and using it for storing the log. That would make
the feature available without re-installing the box, at the cost of temporarily
disabling swap.

Another testing method was booting the kernel from a USB stick and writing the
log there. If there are other free places in the boot disk, it can be used. But
I'm afraid it will make saveoops much more dangerous than what it already is
(by virtue of being much closer to file-system structures and such).

>  - Could we automate the recovery of the dump, and just put it into the regular 
>    kernel log on the next (successful) bootup (on a feature-enabled kernel)? That 
>    would make the log of the 'previous crash' very conveniently available in dmesg 
>    and the syslog. Tools like kerneloops could make use of it immediately.
> 

If I can make the code reliably work, it can be very very useful in a huge
number of situations such as these and more. The catch is the "If" part :)

> All in one, a very intriguing idea IMO, and the hardest bits (lowlevel x86 
> transition) is all implemented already.
> 

There is a place under arch/x86 that already does the lmode->rmode transition?

That was (so far) the hardest part; I quite searched the tree but did not find
any, so I rolled my own lmode->rmode switch code in the file "saveoops-rmode.S"

> 
> 	Ingo

Thanks,

-- 
Darwish
http://darwish.07.googlepages.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 15:36   ` Ahmed S. Darwish
@ 2011-01-25 16:02     ` James Bottomley
  2011-01-25 17:05       ` Ahmed S. Darwish
  2011-01-25 17:32     ` Tony Luck
  1 sibling, 1 reply; 35+ messages in thread
From: James Bottomley @ 2011-01-25 16:02 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	X86-ML, Tony Luck, Dave Jones, Andrew Morton, Randy Dunlap,
	Willy Tarreau, Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML,
	LKML, Linus Torvalds, Peter Zijlstra,
	Frédéric Weisbecker, Borislav Petkov, Arjan van de Ven

On Tue, 2011-01-25 at 17:36 +0200, Ahmed S. Darwish wrote:
> The complete __roadblock__ I'm currently facing though is restoring the disk
> controllers to the state originally setup by the BIOS Power-on self-test (POST).
> I hope such re-initialization is even technically feasible.
> 
> Without such re-initialization, we'll just be risking the BIOS code exploding.
> That was the case in the 5-minute hang described in the cover sheet (PATCH #0).

So this is the bit that's not really technically feasible.  BIOS tends
to run storage devices in a very primitive way (so it takes basic
settings, for example and sets the device up for one particular channel
of access).  When preparing the device for an operating system, we have
to blow away all the bios stuff and put it into a more generally
performant mode (this isn't just the storage per se, it's also the
interrupt and routing).  Unfortunately, currently, we don't bother to
save the settings the BIOS was using, so there's no way to reinitialise
the device back to bios without an effective reboot.  Most BIOS doesn't
seem to contain storage re-POST code that's usable (it's all embedded in
the boot sequence).

James



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 16:02     ` James Bottomley
@ 2011-01-25 17:05       ` Ahmed S. Darwish
  2011-01-25 17:20         ` James Bottomley
  2011-01-25 22:10         ` Mark Lord
  0 siblings, 2 replies; 35+ messages in thread
From: Ahmed S. Darwish @ 2011-01-25 17:05 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, X86-ML, Tony Luck, Dave Jones, Andrew Morton,
	Randy Dunlap, Willy Tarreau, Willy Tarreau, Dirk Hohndel,
	Dirk.Hohndel, IDE-ML, LKML, Linus Torvalds, Peter Zijlstra,
	Frédéric Weisbecker, Borislav Petkov, Arjan van de Ven,
	Jeff Garzik

Hi!

On Tue, Jan 25, 2011 at 11:02:35AM -0500, James Bottomley wrote:
> On Tue, 2011-01-25 at 17:36 +0200, Ahmed S. Darwish wrote:
> > The complete __roadblock__ I'm currently facing though is restoring the disk
> > controllers to the state originally setup by the BIOS Power-on self-test (POST).
> > I hope such re-initialization is even technically feasible.
> > 
> > Without such re-initialization, we'll just be risking the BIOS code exploding.
> > That was the case in the 5-minute hang described in the cover sheet (PATCH #0).
> 
> So this is the bit that's not really technically feasible.  BIOS tends
> to run storage devices in a very primitive way (so it takes basic
> settings, for example and sets the device up for one particular channel
> of access).  When preparing the device for an operating system, we have
> to blow away all the bios stuff and put it into a more generally
> performant mode (this isn't just the storage per se, it's also the
> interrupt and routing).  Unfortunately, currently, we don't bother to
> save the settings the BIOS was using, so there's no way to reinitialise
> the device back to bios without an effective reboot.
>

My current x86 laptop includes the very common ATA PIIX controller. If I
dumped the PIC, IOAPIC, and disk controller state in a register file in a
safe area, and re-initialized these before giving control to the BIOS, can
this move the solution space from being not really technically feasible to
"quite possible"?

I have to admit that my knowledge of disk controllers is nil. The final
result though can really be worth the effort (as stated by Ingo's mail),
so I'm _willing_ to explore and prototype different paths.

>
> Most BIOS doesn't seem to contain storage re-POST code that's usable
> (it's all embedded in the boot sequence).
>

There's this arcane `Warm boot with partial re-initialization' BIOS feature
using the CMOS register 0xf, but I don't want to to depend on it either.

> James
> 

Thanks,

-- 
Darwish
http://darwish.07.googlepages.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 17:05       ` Ahmed S. Darwish
@ 2011-01-25 17:20         ` James Bottomley
  2011-01-25 22:10         ` Mark Lord
  1 sibling, 0 replies; 35+ messages in thread
From: James Bottomley @ 2011-01-25 17:20 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Tejun Heo, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, X86-ML, Tony Luck, Dave Jones, Andrew Morton,
	Randy Dunlap, Willy Tarreau, Willy Tarreau, Dirk Hohndel,
	Dirk.Hohndel, IDE-ML, LKML, Linus Torvalds, Peter Zijlstra,
	Frédéric Weisbecker, Borislav Petkov, Arjan van de Ven,
	Jeff Garzik

On Tue, 2011-01-25 at 19:05 +0200, Ahmed S. Darwish wrote:
> Hi!
> 
> On Tue, Jan 25, 2011 at 11:02:35AM -0500, James Bottomley wrote:
> > On Tue, 2011-01-25 at 17:36 +0200, Ahmed S. Darwish wrote:
> > > The complete __roadblock__ I'm currently facing though is restoring the disk
> > > controllers to the state originally setup by the BIOS Power-on self-test (POST).
> > > I hope such re-initialization is even technically feasible.
> > > 
> > > Without such re-initialization, we'll just be risking the BIOS code exploding.
> > > That was the case in the 5-minute hang described in the cover sheet (PATCH #0).
> > 
> > So this is the bit that's not really technically feasible.  BIOS tends
> > to run storage devices in a very primitive way (so it takes basic
> > settings, for example and sets the device up for one particular channel
> > of access).  When preparing the device for an operating system, we have
> > to blow away all the bios stuff and put it into a more generally
> > performant mode (this isn't just the storage per se, it's also the
> > interrupt and routing).  Unfortunately, currently, we don't bother to
> > save the settings the BIOS was using, so there's no way to reinitialise
> > the device back to bios without an effective reboot.
> >
> 
> My current x86 laptop includes the very common ATA PIIX controller. If I
> dumped the PIC, IOAPIC, and disk controller state in a register file in a
> safe area, and re-initialized these before giving control to the BIOS, can
> this move the solution space from being not really technically feasible to
> "quite possible"?

So I'm sure it would be possible to reverse engineer and make work a
solution for a given motherboard, ATA controller and BIOS ... but
because you have to go all the way up the root of the PCI tree and into
the interrupt controllers to do this, I don't really think it would ever
be a usable generic solutions.

I'd really look at the problem another way: Rather than trying to figure
out how to put a storage subsystem back into bios, if we had a driver
that never took a given device out of BIOS configuration, as in ran it
through the suboptimal bios interface in-kernel, it would be ready to
use the bios interface to dump with.  The problem is that no-one wants
that for their main disk ... however, it would be less of a problem if
we dedicated, say, a USB controller and stick to this.

James



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -next 1/2][RFC] x86: Saveoops: Switch to real-mode and call BIOS
  2011-01-25 13:51 ` [PATCH -next 1/2][RFC] x86: Saveoops: Switch to real-mode and call BIOS Ahmed S. Darwish
@ 2011-01-25 17:26   ` H. Peter Anvin
  0 siblings, 0 replies; 35+ messages in thread
From: H. Peter Anvin @ 2011-01-25 17:26 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Thomas Gleixner, Ingo Molnar, X86-ML, Tony Luck, Dave Jones,
	Andrew Morton, Randy Dunlap, Willy Tarreau, Willy Tarreau,
	Dirk Hohndel, Dirk.Hohndel, IDE-ML, LKML

On 01/25/2011 05:51 AM, Ahmed S. Darwish wrote:
> 
> We get called here upon panic()s to save the kernel log buffer.
> 
> First, switch from 64-bit long mode to 16-bit real mode. Afterwards, save the
> log buffer to disk using extended INT 0x13 BIOS services. The user has given
> us an absolute LBA disk address to save the log buffer to.
> 
> By x86 design, this code is mandated to run on a single identity-mapped page.
> 
> - How to initialize the disk hardware to its POST state (thus making the
>   BIOS code work reliably) while keeping system RAM unmodified?

You can't safely do so, really.

> - Is it guaranteed that '0x80' will always be the boot disk drive number?
>   If not, we need to be passed the boot drive number from the bootloader.

It's not, and we may not even be booting from disk.

This code seems extremely dangerous, in the "may eat your data" kind of
way.  Using the BIOS once the kernel has run is cantankerous, using it
to *write* is potentially lethal.

	-hpa



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -next 2/2][RFC] x86: Saveoops: Reserve low memory and register code
  2011-01-25 13:53 ` [PATCH -next 2/2][RFC] x86: Saveoops: Reserve low memory and register code Ahmed S. Darwish
@ 2011-01-25 17:29   ` H. Peter Anvin
  2011-01-26  9:04     ` Ahmed S. Darwish
  0 siblings, 1 reply; 35+ messages in thread
From: H. Peter Anvin @ 2011-01-25 17:29 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Thomas Gleixner, Ingo Molnar, X86-ML, Tony Luck, Dave Jones,
	Andrew Morton, Randy Dunlap, Willy Tarreau, Willy Tarreau,
	Dirk Hohndel, Dirk.Hohndel, Simon Kagstrom, IDE-ML, LKML

On 01/25/2011 05:53 AM, Ahmed S. Darwish wrote:
> +
> +/*
> + * Extended BIOS services write to disk in units of 512-byte sectors.
> + * Thus, always align the ring buffer size on a 512-byte boundary.
> + */

Units of sectors, not always 512 bytes.  This needs to be done
correctly, or you will destroy real data.

> +/*
> + * Page tables to identity map the first 2 Mbytes.
> + */
> +static __aligned(PAGE_SIZE) pud_t ident_level3[PTRS_PER_PUD];
> +static __aligned(PAGE_SIZE) pmd_t ident_level2[PTRS_PER_PMD];
> +
> +/*
> + * The lmode->rmode switching code needs to run from an identity page
> + * since it disables paging.
> + */
> +static void build_identity_mappings(void)
> +{
> +	pgd_t *pgde;
> +	pud_t *pude;
> +	pmd_t *pmde;
> +
> +	pmde = ident_level2;
> +	set_pmd(pmde, __pmd(0 + __PAGE_KERNEL_IDENT_LARGE_EXEC));
> +
> +	pude = ident_level3;
> +	set_pud(pude, __pud(__pa(ident_level2) + _KERNPG_TABLE));
> +
> +	pgde = init_level4_pgt;
> +	set_pgd(pgde, __pgd(__pa(ident_level3) + _KERNPG_TABLE));
> +
> +	__flush_tlb_all();
> +}

We now have a permanent identity map so there is no point in building a
new one.

However, I'm quite nervous about this -- this patch has *plenty* of real
possibility of wrecking data.

	-hpa
-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 15:36   ` Ahmed S. Darwish
  2011-01-25 16:02     ` James Bottomley
@ 2011-01-25 17:32     ` Tony Luck
  2011-01-25 17:36       ` H. Peter Anvin
  2011-01-25 19:04       ` Jeff Garzik
  1 sibling, 2 replies; 35+ messages in thread
From: Tony Luck @ 2011-01-25 17:32 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	X86-ML, Dave Jones, Andrew Morton, Randy Dunlap, Willy Tarreau,
	Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML, LKML,
	Linus Torvalds, Peter Zijlstra, Frédéric Weisbecker,
	Borislav Petkov, Arjan van de Ven

On Tue, Jan 25, 2011 at 7:36 AM, Ahmed S. Darwish <darwish.07@gmail.com> wrote:
> I did not have to re-partition the box here. A kindof a hacky solution was
> disabling the swap partition and using it for storing the log. That would make
> the feature available without re-installing the box, at the cost of temporarily
> disabling swap.

Using swap space as a dump area has a long and established tradition
going back to the early roots of Unix - so I don't think that it is all that
hacky. I think that modern systems even write some magic at the start
of the swap partition that you could use to verify that you were writing to
the correct spot ... and it should be easy to retrieve your dumped data
before the swap gets re-enabled by the new kernel after the reboot.
[Perhaps the new kernel could do this automatically if it finds some
signature that your code leaves in the swap area so it could stuff the data
into my /dev/pstore filesystem?]

One more "is this bit of the BIOS code safe" concern that I have is that
you'll be using the "write" path of the INT 0x13 code ... which isn't the
path that is tested by booting ... it *ought* to be OK - but untested paths
in BIOS seem to be broken paths all too often.

-Tony

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 15:08   ` Tejun Heo
@ 2011-01-25 17:33     ` H. Peter Anvin
  2011-01-26 11:44       ` Ahmed S. Darwish
  2011-02-03 14:36     ` Pavel Machek
  1 sibling, 1 reply; 35+ messages in thread
From: H. Peter Anvin @ 2011-01-25 17:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, Ahmed S. Darwish, Thomas Gleixner, Ingo Molnar,
	X86-ML, Tony Luck, Dave Jones, Andrew Morton, Randy Dunlap,
	Willy Tarreau, Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML,
	LKML, Linus Torvalds, Peter Zijlstra,
	Frédéric Weisbecker, Borislav Petkov, Arjan van de Ven

On 01/25/2011 07:08 AM, Tejun Heo wrote:
> 
> There are holes but writing to them without full knowledge of the
> configuration can be quite dangerous.  I don't think it would be
> possible to mass deploy it without manual configuration unless we
> specifically reserve (and maybe mark) it in the filesystem.
> 

Reserve and mark in the filesystem is relatively straightforward, except
for btrfs, which doesn't have support for reserved extents.  This is a
bit of a shortcoming in btrfs.

>> All in one, a very intriguing idea IMO, and the hardest bits
>> (lowlevel x86 transition) is all implemented already.

Lowlevel x86 transition is not at all the hardest part.  It's
detail-oriented but well defined (and, I might add, incompletely
implemented -- 64 bits only, not using facilities we already have); the
*hard* part, and I mean harder by orders of magnitude, is to get the
BIOS to behave sanely without an intervening reboot, *AND* not trash
your data, ever.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 17:32     ` Tony Luck
@ 2011-01-25 17:36       ` H. Peter Anvin
  2011-01-25 19:04       ` Jeff Garzik
  1 sibling, 0 replies; 35+ messages in thread
From: H. Peter Anvin @ 2011-01-25 17:36 UTC (permalink / raw)
  To: Tony Luck
  Cc: Ahmed S. Darwish, Ingo Molnar, Thomas Gleixner, Ingo Molnar,
	X86-ML, Dave Jones, Andrew Morton, Randy Dunlap, Willy Tarreau,
	Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML, LKML,
	Linus Torvalds, Peter Zijlstra, Frédéric Weisbecker,
	Borislav Petkov, Arjan van de Ven

On 01/25/2011 09:32 AM, Tony Luck wrote:
> 
> Using swap space as a dump area has a long and established tradition
> going back to the early roots of Unix - so I don't think that it is all that
> hacky. I think that modern systems even write some magic at the start
> of the swap partition that you could use to verify that you were writing to
> the correct spot ... and it should be easy to retrieve your dumped data
> before the swap gets re-enabled by the new kernel after the reboot.
> [Perhaps the new kernel could do this automatically if it finds some
> signature that your code leaves in the swap area so it could stuff the data
> into my /dev/pstore filesystem?]
> 
> One more "is this bit of the BIOS code safe" concern that I have is that
> you'll be using the "write" path of the INT 0x13 code ... which isn't the
> path that is tested by booting ... it *ought* to be OK - but untested paths
> in BIOS seem to be broken paths all too often.
> 

It's not just that, it's that you're going through it *after the kernel
has run*.  The kernel has wrecked the state of the system -- the disk
hardware, the PCI hierarchy, the interrupt system -- as far as the BIOS
is concerned.  It is no longer safe to trust the BIOS to use your
hardware, especially not for writing.

So what is the solution?  We have to carry our own disk driver for the
emergency.  In other words, we're very quickly starting to make
something that looks a whole lot like kexec/kdump.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 17:32     ` Tony Luck
  2011-01-25 17:36       ` H. Peter Anvin
@ 2011-01-25 19:04       ` Jeff Garzik
  1 sibling, 0 replies; 35+ messages in thread
From: Jeff Garzik @ 2011-01-25 19:04 UTC (permalink / raw)
  To: Tony Luck
  Cc: Ahmed S. Darwish, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, X86-ML, Dave Jones, Andrew Morton, Randy Dunlap,
	Willy Tarreau, Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML,
	LKML, Linus Torvalds, Peter Zijlstra,
	Frédéric Weisbecker, Borislav Petkov, Arjan van de Ven

On 01/25/2011 12:32 PM, Tony Luck wrote:
> One more "is this bit of the BIOS code safe" concern that I have is that
> you'll be using the "write" path of the INT 0x13 code ... which isn't the
> path that is tested by booting ... it *ought* to be OK - but untested paths
> in BIOS seem to be broken paths all too often.


Not really...  as others have noted, we reprogram the controller in a 
different mode, once the kernel starts.

	Jeff



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 13:47 [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic Ahmed S. Darwish
                   ` (3 preceding siblings ...)
  2011-01-25 14:49 ` Tejun Heo
@ 2011-01-25 20:25 ` Linus Torvalds
       [not found]   ` <20110126124954.GC24527@laptop>
  4 siblings, 1 reply; 35+ messages in thread
From: Linus Torvalds @ 2011-01-25 20:25 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, X86-ML, Tony Luck,
	Dave Jones, Andrew Morton, Randy Dunlap, Willy Tarreau,
	Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML, LKML

On Tue, Jan 25, 2011 at 11:47 PM, Ahmed S. Darwish <darwish.07@gmail.com> wrote:
>
> I've faced some very early panics in latest kernel. Being a run of the mill
> x86 laptop, the machine is void of debugging aids like serial ports or
> network boot.
>
> As a possible solution, below patches prototypes the idea of persistently
> storing the kernel log ring to a hard disk partition using the enhanced BIOS
> 0x13 services.

Quite frankly, I'm not likely to _ever_ merge anything like this.

Over the years, many people have tried to write things to disk on
oops. I refuse to take it. No way in hell do I want the situation of
"the system is screwed, so let's overwrite the disk" to be something
the kernel I release might do. It's crazy. That disk is a lot more
important than the kernel, and overwriting it when we might have
serious memory corruption issues or something is not a thing I feel is
appropriate.

I also don't think that it's safe to use the BIOS routines. That's not
the environment they have been tested in  - the boot environment is
very different from the "running kernel" setup. Devices will have been
possibly remapped, virtual mappings are different, things are just
very random.

Some vendors have taken things like this in the past, but I just
wanted to say that I think it's too damn scary. I _really_ don't see
the point.

If you want to do the BIOS services thing, do it for video: copy the
oops to low RAM, return to real mode, re-run the graphics card POST
routines to initialize text-mode, and use the BIOS to print out the
oops.  That is WAY less scary than writing to disk.

                              Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 17:05       ` Ahmed S. Darwish
  2011-01-25 17:20         ` James Bottomley
@ 2011-01-25 22:10         ` Mark Lord
  2011-01-25 22:16           ` Randy Dunlap
  1 sibling, 1 reply; 35+ messages in thread
From: Mark Lord @ 2011-01-25 22:10 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: James Bottomley, Tejun Heo, Ingo Molnar, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, X86-ML, Tony Luck, Dave Jones,
	Andrew Morton, Randy Dunlap, Willy Tarreau, Willy Tarreau,
	Dirk Hohndel, Dirk.Hohndel, IDE-ML, LKML, Linus Torvalds,
	Peter Zijlstra, Frédéric Weisbecker, Borislav Petkov,
	Arjan van de Ven, Jeff Garzik

On 11-01-25 12:05 PM, Ahmed S. Darwish wrote:
>
> My current x86 laptop includes the very common ATA PIIX controller. 


ata_piix is just about ideal for this sort of thing.

Except, don't use the BIOS to write the logs,
but rather code/use a very simple set of polling-PIO
functions to talk directly through the PIIX to the drive.

Really really simple code to do that, and it would likely
work with anything ata-piix, and most other non-AHCI chipsets too.

Not perfect, but probably good enough for a lot of scenarios.
The old hd.c driver shows how to read/write a sector at a time,
and that kind of code is easily converted to simply poll for completion.

Cheers

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 22:10         ` Mark Lord
@ 2011-01-25 22:16           ` Randy Dunlap
  2011-01-25 22:45             ` Jeff Garzik
  0 siblings, 1 reply; 35+ messages in thread
From: Randy Dunlap @ 2011-01-25 22:16 UTC (permalink / raw)
  To: Mark Lord
  Cc: Ahmed S. Darwish, James Bottomley, Tejun Heo, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, X86-ML, Tony Luck,
	Dave Jones, Andrew Morton, Willy Tarreau, Willy Tarreau,
	Dirk Hohndel, Dirk.Hohndel, IDE-ML, LKML, Linus Torvalds,
	Peter Zijlstra, Frédéric Weisbecker, Borislav Petkov,
	Arjan van de Ven, Jeff Garzik

On Tue, 25 Jan 2011 17:10:49 -0500 Mark Lord wrote:

> On 11-01-25 12:05 PM, Ahmed S. Darwish wrote:
> >
> > My current x86 laptop includes the very common ATA PIIX controller. 
> 
> 
> ata_piix is just about ideal for this sort of thing.
> 
> Except, don't use the BIOS to write the logs,
> but rather code/use a very simple set of polling-PIO
> functions to talk directly through the PIIX to the drive.
> 
> Really really simple code to do that, and it would likely
> work with anything ata-piix, and most other non-AHCI chipsets too.
> 
> Not perfect, but probably good enough for a lot of scenarios.
> The old hd.c driver shows how to read/write a sector at a time,
> and that kind of code is easily converted to simply poll for completion.

I don't know how/where to find it, but Rusty Russell had a version of this
many, many years ago.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 22:16           ` Randy Dunlap
@ 2011-01-25 22:45             ` Jeff Garzik
  2011-01-25 22:58               ` H. Peter Anvin
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Garzik @ 2011-01-25 22:45 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Mark Lord, Ahmed S. Darwish, James Bottomley, Tejun Heo,
	Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	X86-ML, Tony Luck, Dave Jones, Andrew Morton, Willy Tarreau,
	Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML, LKML,
	Linus Torvalds, Peter Zijlstra, Frédéric Weisbecker,
	Borislav Petkov, Arjan van de Ven, Rusty Russell

On 01/25/2011 05:16 PM, Randy Dunlap wrote:
> On Tue, 25 Jan 2011 17:10:49 -0500 Mark Lord wrote:
>
>> On 11-01-25 12:05 PM, Ahmed S. Darwish wrote:
>>>
>>> My current x86 laptop includes the very common ATA PIIX controller.
>>
>>
>> ata_piix is just about ideal for this sort of thing.
>>
>> Except, don't use the BIOS to write the logs,
>> but rather code/use a very simple set of polling-PIO
>> functions to talk directly through the PIIX to the drive.
>>
>> Really really simple code to do that, and it would likely
>> work with anything ata-piix, and most other non-AHCI chipsets too.
>>
>> Not perfect, but probably good enough for a lot of scenarios.
>> The old hd.c driver shows how to read/write a sector at a time,
>> and that kind of code is easily converted to simply poll for completion.
>
> I don't know how/where to find it, but Rusty Russell had a version of this
> many, many years ago.

You beat me to the reply :)

http://lwn.net/Articles/9905/

but IIRC there were updates and improvements.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 22:45             ` Jeff Garzik
@ 2011-01-25 22:58               ` H. Peter Anvin
  2011-01-26  0:26                 ` Jeff Garzik
  2011-01-31  2:59                 ` Rusty Russell
  0 siblings, 2 replies; 35+ messages in thread
From: H. Peter Anvin @ 2011-01-25 22:58 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Randy Dunlap, Mark Lord, Ahmed S. Darwish, James Bottomley,
	Tejun Heo, Ingo Molnar, Thomas Gleixner, Ingo Molnar, X86-ML,
	Tony Luck, Dave Jones, Andrew Morton, Willy Tarreau,
	Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML, LKML,
	Linus Torvalds, Peter Zijlstra, Frédéric Weisbecker,
	Borislav Petkov, Arjan van de Ven, Rusty Russell

On 01/25/2011 02:45 PM, Jeff Garzik wrote:
> On 01/25/2011 05:16 PM, Randy Dunlap wrote:
>> On Tue, 25 Jan 2011 17:10:49 -0500 Mark Lord wrote:
>>
>>> On 11-01-25 12:05 PM, Ahmed S. Darwish wrote:
>>>>
>>>> My current x86 laptop includes the very common ATA PIIX controller.
>>>
>>>
>>> ata_piix is just about ideal for this sort of thing.
>>>
>>> Except, don't use the BIOS to write the logs,
>>> but rather code/use a very simple set of polling-PIO
>>> functions to talk directly through the PIIX to the drive.
>>>
>>> Really really simple code to do that, and it would likely
>>> work with anything ata-piix, and most other non-AHCI chipsets too.
>>>
>>> Not perfect, but probably good enough for a lot of scenarios.
>>> The old hd.c driver shows how to read/write a sector at a time,
>>> and that kind of code is easily converted to simply poll for completion.
>>
>> I don't know how/where to find it, but Rusty Russell had a version of this
>> many, many years ago.
> 
> You beat me to the reply :)
> 
> http://lwn.net/Articles/9905/
> 
> but IIRC there were updates and improvements.
> 

"Not perfect" is really not good enough.  You're writing to the disk, so
it *has* to be perfect.  That means dealing with having shared IDE/AHCI
device and doing the right thing, and the possibility that the kernel
has interrupts enabled, yadda yadda.

In the end you end up with something that looks, again, like kexec/kdump.

	-hpa


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 22:58               ` H. Peter Anvin
@ 2011-01-26  0:26                 ` Jeff Garzik
  2011-01-31  2:59                 ` Rusty Russell
  1 sibling, 0 replies; 35+ messages in thread
From: Jeff Garzik @ 2011-01-26  0:26 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Randy Dunlap, Mark Lord, Ahmed S. Darwish, James Bottomley,
	Tejun Heo, Ingo Molnar, Thomas Gleixner, Ingo Molnar, X86-ML,
	Tony Luck, Dave Jones, Andrew Morton, Willy Tarreau,
	Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML, LKML,
	Linus Torvalds, Peter Zijlstra, Frédéric Weisbecker,
	Borislav Petkov, Arjan van de Ven, Rusty Russell

On 01/25/2011 05:58 PM, H. Peter Anvin wrote:
> "Not perfect" is really not good enough.  You're writing to the disk, so
> it *has* to be perfect.  That means dealing with having shared IDE/AHCI
> device and doing the right thing, and the possibility that the kernel
> has interrupts enabled, yadda yadda.
>
> In the end you end up with something that looks, again, like kexec/kdump.

Oh, quite agreed.  It wasn't an endorsement, just a reference :)

	Jeff




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -next 2/2][RFC] x86: Saveoops: Reserve low memory and register code
  2011-01-25 17:29   ` H. Peter Anvin
@ 2011-01-26  9:04     ` Ahmed S. Darwish
  0 siblings, 0 replies; 35+ messages in thread
From: Ahmed S. Darwish @ 2011-01-26  9:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, X86-ML, Tony Luck,
	Dave Jones, Andrew Morton, Randy Dunlap, Willy Tarreau,
	Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML, LKML,
	Linus Torvalds, Peter Zijlstra, Frédéric Weisbecker,
	Borislav Petkov, Arjan van de Ven, Tejun Heo, James Bottomley,
	Mark Lord, Jeff Garzik

On Tue, Jan 25, 2011 at 09:29:58AM -0800, H. Peter Anvin wrote:
> 
> However, I'm quite nervous about this -- this patch has *plenty* of real
> possibility of wrecking data.
> 

Yes, it does.

Keep in mind though that now I'm just prototyping different solutions to
a problem I'm facing. I fully understand that in no way this patch is
going to be merged in its current state.

I'll send another email summarizing the criticism and proposing different
paths in a moment.

thanks,

-- 
Darwish
http://darwish.07.googlepages.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 17:33     ` H. Peter Anvin
@ 2011-01-26 11:44       ` Ahmed S. Darwish
  0 siblings, 0 replies; 35+ messages in thread
From: Ahmed S. Darwish @ 2011-01-26 11:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, X86-ML, Tony Luck,
	Dave Jones, Andrew Morton, Randy Dunlap, Willy Tarreau,
	Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML, LKML,
	Linus Torvalds, Peter Zijlstra, Frédéric Weisbecker,
	Borislav Petkov, Arjan van de Ven, Tejun Heo, James Bottomley,
	Mark Lord, Jeff Garzik

On Tue, Jan 25, 2011 at 09:33:18AM -0800, H. Peter Anvin wrote:
> On 01/25/2011 07:08 AM, Tejun Heo wrote:
> 
> >> All in one, a very intriguing idea IMO, and the hardest bits
> >> (lowlevel x86 transition) is all implemented already.
> 
> Lowlevel x86 transition is not at all the hardest part.  It's
> detail-oriented but well defined (and, I might add, incompletely
> implemented -- 64 bits only, not using facilities we already have);
>

Making the transition code applicable to 32-bit CPUs will hopefully be
quite simple: it's mostly a subset of the posted parts.

Since I might use that transition code for a different purpose now, any
facilities not properly used beside the identity mappings?

thanks,

-- 
Darwish
http://darwish.07.googlepages.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
       [not found]   ` <20110126124954.GC24527@laptop>
@ 2011-01-26 23:07     ` Luck, Tony
       [not found]       ` <20110126231620.GA14807@redhat.com>
       [not found]     ` <20110127021338.GA20334@redhat.com>
  1 sibling, 1 reply; 35+ messages in thread
From: Luck, Tony @ 2011-01-26 23:07 UTC (permalink / raw)
  To: Ahmed S. Darwish, Linus Torvalds
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	X86-ML, Dave Jones, Andrew Morton, Randy Dunlap, Willy Tarreau,
	Willy Tarreau, Dirk Hohndel, Hohndel, Dirk, IDE-ML, LKML,
	Peter Zijlstra, Frédéric Weisbecker, Borislav Petkov,
	Arjan van de Ven, Tejun Heo, James Bottomley, Mark Lord,
	Jeff Garzik, Eric Biederman, Vivek Goyal, Haren Myneni, KEXEC-ML,
	FBDEV-ML

>- The latest approach (proposed by Linus) is to forget the disk: jump to
>  real-mode, but display the kernel log in a fancy format (with scroll
>  ups and downs) instead.

A while ago (first Plumbers conference?) someone was talking about
using a 2-d barcode to display the tail of the kernel log & oops
register data - with the plan that you could capture the image with
a cell phone camera, and then get all the oops data without worrying
about transcription errors as you wrote down & re-typed all the hex.

Anyone know what happened to that plan?

-Tony



^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
       [not found]                 ` <20110127120039.GD20279@elte.hu>
@ 2011-01-27 18:35                   ` Luck, Tony
       [not found]                   ` <4D4197CB.9070201@zytor.com>
  1 sibling, 0 replies; 35+ messages in thread
From: Luck, Tony @ 2011-01-27 18:35 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin
  Cc: Dave Jones, Ahmed S. Darwish, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, X86-ML, Andrew Morton, Randy Dunlap, Willy Tarreau,
	Willy Tarreau, Dirk Hohndel, Hohndel, Dirk, IDE-ML, LKML,
	Peter Zijlstra, Frédéric Weisbecker, Borislav Petkov,
	Arjan van de Ven, Tejun Heo, James Bottomley, Mark Lord,
	Jeff Garzik, Eric Biederman, Vivek Goyal, Haren Myneni, FBDEV-ML

> - A bar code is encoded and i generally want to know what info i'm sending to
>   potentially untrusted parties ... also, i want to be sure i'm sending the right
>   oops, at a glance, before hitting 'send' on the email.

Maybe someone will write an iPhone/Android/Meego app that takes the picture, does
The decode and lets you look, together with options on whether to just send to
Your e-mail address, or to Cc; kerneloops?

> - A screenshot of a well-compressed oops output tells a lot of context as well:
>   surrounding kernel messages, general state of the system when it crashed.
>   Sometimes it tells me the type of the laptop as well, via the logo visible on the
>   border of the picture ;-) Context strengthens whether i can *trust* the oops - 
>   and often i'm really in doubt about oopses and want more context.

Perhaps some of this context can be encoded in the barcode too?

-Tony

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
       [not found]                     ` <20110127162429.GB26437@elte.hu>
@ 2011-01-27 18:56                       ` Luck, Tony
  0 siblings, 0 replies; 35+ messages in thread
From: Luck, Tony @ 2011-01-27 18:56 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin
  Cc: Dave Jones, Ahmed S. Darwish, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, X86-ML, Andrew Morton, Randy Dunlap, Willy Tarreau,
	Willy Tarreau, Dirk Hohndel, Hohndel, Dirk, IDE-ML, LKML,
	Peter Zijlstra, Frédéric Weisbecker, Borislav Petkov,
	Arjan van de Ven, Tejun Heo, James Bottomley, Mark Lord,
	Jeff Garzik, Eric Biederman, Vivek Goyal, Haren Myneni, FBDEV-ML

> Yeah, i've done it countless times and it's very painful indeed - but the 
> alternative "losing context" would be even worse to me. Can we do both - i.e. have 
> the barcode on the screen alongside the oops itself? (although at that point they'll 
> be taking screen real estate from each other - degrading the information in *both* 
> venues.)
>
> Keyboard 'oops scrolling' driver with 'flip to barcode output' key? :-)

If we can make the screen scroll, or guarantee that a hot key will switch to
the barcode view, then those look like good solutions. Sharing space on a
non-scrollable view sounds like a poor choice. Perhaps better to have a
command line option to say which to provide (default = text).

-Tony

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 14:49 ` Tejun Heo
@ 2011-01-28  7:59   ` Jan Ceuleers
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Ceuleers @ 2011-01-28  7:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ahmed S. Darwish, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	X86-ML, Tony Luck, Dave Jones, Andrew Morton, Randy Dunlap,
	Willy Tarreau, Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML,
	LKML

On 25/01/11 15:49, Tejun Heo wrote:
> I'm afraid this is gonna be something which works sometimes (or even
> more times than not) but can't ever be made reliable.  I think it
> would be better to head toward usb or other kind of early console.

Apologies in advance if this is a stupid idea, but would it be possible 
and safer to dedicate a whole device (such as a USB thumb drive or a 
memory card) to this? You could initialise the media in a certain way to 
let the kernel know that it is OK to trample on the device.

You'd have a mkpoops command (make persistent oops :-). Perhaps this 
could be a new partition table type. The kernel could either auto-detect 
such a device or be told which-one to use. It could then, upon crash and 
prior to writing to it, re-verify that the device bears the hallmarks of 
being a poops device.

Thanks, Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 22:58               ` H. Peter Anvin
  2011-01-26  0:26                 ` Jeff Garzik
@ 2011-01-31  2:59                 ` Rusty Russell
  2011-01-31 10:45                   ` Ingo Molnar
  1 sibling, 1 reply; 35+ messages in thread
From: Rusty Russell @ 2011-01-31  2:59 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, Randy Dunlap, Mark Lord, Ahmed S. Darwish,
	James Bottomley, Tejun Heo, Ingo Molnar, Thomas Gleixner,
	Ingo Molnar, X86-ML, Tony Luck, Dave Jones, Andrew Morton,
	Willy Tarreau, Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML,
	LKML, Linus Torvalds, Peter Zijlstra,
	Frédéric Weisbecker, Borislav Petkov, Arjan van de Ven

On Wed, 26 Jan 2011 09:28:36 am H. Peter Anvin wrote:
> On 01/25/2011 02:45 PM, Jeff Garzik wrote:
> > On 01/25/2011 05:16 PM, Randy Dunlap wrote:
> >> On Tue, 25 Jan 2011 17:10:49 -0500 Mark Lord wrote:
> >>
> >>> On 11-01-25 12:05 PM, Ahmed S. Darwish wrote:
> >>>>
> >>>> My current x86 laptop includes the very common ATA PIIX controller.
> >>>
> >>>
> >>> ata_piix is just about ideal for this sort of thing.
> >>>
> >>> Except, don't use the BIOS to write the logs,
> >>> but rather code/use a very simple set of polling-PIO
> >>> functions to talk directly through the PIIX to the drive.
> >>>
> >>> Really really simple code to do that, and it would likely
> >>> work with anything ata-piix, and most other non-AHCI chipsets too.
> >>>
> >>> Not perfect, but probably good enough for a lot of scenarios.
> >>> The old hd.c driver shows how to read/write a sector at a time,
> >>> and that kind of code is easily converted to simply poll for completion.
> >>
> >> I don't know how/where to find it, but Rusty Russell had a version of this
> >> many, many years ago.
> > 
> > You beat me to the reply :)
> > 
> > http://lwn.net/Articles/9905/
> > 
> > but IIRC there were updates and improvements.
> > 
> 
> "Not perfect" is really not good enough.  You're writing to the disk, so
> it *has* to be perfect.  That means dealing with having shared IDE/AHCI
> device and doing the right thing, and the possibility that the kernel
> has interrupts enabled, yadda yadda.

Not for the limited problem of trying to get some logs out.

My driver read the sectors, checked it was full of the magic signature, then
overwrote them.  It's possible for that to screw up, but unlikely.  The boot
script mailed it off to a central site, reset the sector and rearmed the
oopsdumper.

> In the end you end up with something that looks, again, like kexec/kdump.

That was the proposed solution back then, too.  If I'd realized how long
that would take to arrive, I would have kept pushing...

BTW, anyone know whether distros turn it on?    Ubuntu seems to enable it
in the config, but I've never seen it in action.  My laptop has frozen
a couple of times, but maybe it really froze...

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-31  2:59                 ` Rusty Russell
@ 2011-01-31 10:45                   ` Ingo Molnar
  0 siblings, 0 replies; 35+ messages in thread
From: Ingo Molnar @ 2011-01-31 10:45 UTC (permalink / raw)
  To: Rusty Russell
  Cc: H. Peter Anvin, Jeff Garzik, Randy Dunlap, Mark Lord,
	Ahmed S. Darwish, James Bottomley, Tejun Heo, Thomas Gleixner,
	Ingo Molnar, X86-ML, Tony Luck, Dave Jones, Andrew Morton,
	Willy Tarreau, Willy Tarreau, Dirk Hohndel, Dirk.Hohndel, IDE-ML,
	LKML, Linus Torvalds, Peter Zijlstra,
	Frédéric Weisbecker, Borislav Petkov, Arjan van de Ven


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> > In the end you end up with something that looks, again, like kexec/kdump.
> 
> That was the proposed solution back then, too.  If I'd realized how long that 
> would take to arrive, I would have kept pushing...

That's what happens when a feature crosses 2 or 3 project boundaries: from the 
reasonable 2-3 months upstream arrival time it lengthens to a few years, up to a 
decade.

The solution is IMO obvious: move the kexec tools into tools/ and integrate it much 
more tightly and make it all more usable to the average kernel tester.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
       [not found]           ` <m1sjweyeax.fsf@fess.ebiederm.org>
@ 2011-02-02 11:13             ` Ahmed S. Darwish
  0 siblings, 0 replies; 35+ messages in thread
From: Ahmed S. Darwish @ 2011-02-02 11:13 UTC (permalink / raw)
  To: Ingo Molnar, Eric W. Biederman
  Cc: H. Peter Anvin, Vivek Goyal, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, X86-ML, Tony Luck, Dave Jones, Andrew Morton,
	Randy Dunlap, Willy Tarreau, Willy Tarreau, Dirk Hohndel,
	Dirk.Hohndel, IDE-ML, LKML, Peter Zijlstra,
	Frédéric Weisbecker, Borislav Petkov, Arjan van de Ven,
	Tejun Heo, James Bottomley, Mark Lord, Jeff Garzik, Haren Myneni,
	KEXEC-ML, FBDEV-ML

Hi,

Quick note: the Internet has just returned back here after a full
5-day shutdown by the “authorities”. I will hopefully return back
home on Saturday to continue working on this.

thanks,

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-01-25 15:08   ` Tejun Heo
  2011-01-25 17:33     ` H. Peter Anvin
@ 2011-02-03 14:36     ` Pavel Machek
  2011-02-03 15:28       ` H. Peter Anvin
  1 sibling, 1 reply; 35+ messages in thread
From: Pavel Machek @ 2011-02-03 14:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, Ahmed S. Darwish, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, X86-ML, Tony Luck, Dave Jones, Andrew Morton,
	Randy Dunlap, Willy Tarreau, Willy Tarreau, Dirk Hohndel,
	Dirk.Hohndel, IDE-ML, LKML, Linus Torvalds, Peter Zijlstra,
	Fr?d?ric Weisbecker, Borislav Petkov, Arjan van de Ven

Hi!

> > I think the biggest areas of practical concern would be:
> > 
> >  - Can this mechanism ever, under any circumstance corrupt any real
> >    data, destroy the MBR or do other nasties. Can you think of any
> >    additional fail-safe measures where you could _further robustify
> >    the BIOS calls_ to make sure it can never go to the wrong
> >    sector(s)? I really do not want to think of trusting a BIOS to
> >    _write to my disk_.
> 
> It's quite unlikely but I wouldn't say it's completely impossible.
> It's common for ATA controllers to have dual modes of operation - the
> old IDE compatible interface by emulation which is used by BIOS and
> older operating systems and newer interface (ahci) to be used by
> modern OS.  Some need to be explicitly switched and some just need to
> be accessed carefully.  If the controller is accessed by bIOS after
> switched to ahci or commands are in progress via ahci, anything can
> happen.

Could we read the log area, first, verify it contains signature, write
it back?
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-02-03 14:36     ` Pavel Machek
@ 2011-02-03 15:28       ` H. Peter Anvin
  2011-02-03 17:57         ` Ingo Molnar
  0 siblings, 1 reply; 35+ messages in thread
From: H. Peter Anvin @ 2011-02-03 15:28 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Tejun Heo, Ingo Molnar, Ahmed S. Darwish, Thomas Gleixner,
	Ingo Molnar, X86-ML, Tony Luck, Dave Jones, Andrew Morton,
	Randy Dunlap, Willy Tarreau, Willy Tarreau, Dirk Hohndel,
	Dirk.Hohndel, IDE-ML, LKML, Linus Torvalds, Peter Zijlstra,
	Fr?d?ric Weisbecker, Borislav Petkov, Arjan van de Ven

On 02/03/2011 06:36 AM, Pavel Machek wrote:
> 
> Could we read the log area, first, verify it contains signature, write
> it back?
> 								Pavel

Yes, but that doesn't guarantee no data corruption caused by handing
over from one driver to another.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-02-03 15:28       ` H. Peter Anvin
@ 2011-02-03 17:57         ` Ingo Molnar
  2011-02-03 21:07           ` H. Peter Anvin
  0 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2011-02-03 17:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Pavel Machek, Tejun Heo, Ahmed S. Darwish, Thomas Gleixner,
	Ingo Molnar, X86-ML, Tony Luck, Dave Jones, Andrew Morton,
	Randy Dunlap, Willy Tarreau, Willy Tarreau, Dirk Hohndel,
	Dirk.Hohndel, IDE-ML, LKML, Linus Torvalds, Peter Zijlstra,
	Fr?d?ric Weisbecker, Borislav Petkov, Arjan van de Ven


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 02/03/2011 06:36 AM, Pavel Machek wrote:
> > 
> > Could we read the log area, first, verify it contains signature, write
> > it back?
> > 								Pavel
> 
> Yes, but that doesn't guarantee no data corruption caused by handing
> over from one driver to another.

Waiting a few seconds? Is there any sufficiently high number of X where waiting X 
seconds would make it safe to touch the hardware? (i.e. it would guarantee that 
pending commands are flushed, etc.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic
  2011-02-03 17:57         ` Ingo Molnar
@ 2011-02-03 21:07           ` H. Peter Anvin
  0 siblings, 0 replies; 35+ messages in thread
From: H. Peter Anvin @ 2011-02-03 21:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pavel Machek, Tejun Heo, Ahmed S. Darwish, Thomas Gleixner,
	Ingo Molnar, X86-ML, Tony Luck, Dave Jones, Andrew Morton,
	Randy Dunlap, Willy Tarreau, Willy Tarreau, Dirk Hohndel,
	Dirk.Hohndel, IDE-ML, LKML, Linus Torvalds, Peter Zijlstra,
	Fr?d?ric Weisbecker, Borislav Petkov, Arjan van de Ven

On 02/03/2011 09:57 AM, Ingo Molnar wrote:
> 
> * H. Peter Anvin <hpa@zytor.com> wrote:
> 
>> On 02/03/2011 06:36 AM, Pavel Machek wrote:
>>>
>>> Could we read the log area, first, verify it contains signature, write
>>> it back?
>>> 								Pavel
>>
>> Yes, but that doesn't guarantee no data corruption caused by handing
>> over from one driver to another.
> 
> Waiting a few seconds? Is there any sufficiently high number of X where waiting X 
> seconds would make it safe to touch the hardware? (i.e. it would guarantee that 
> pending commands are flushed, etc.)
> 

Probably not... if you have enough control over the hardware to force a
device reset you should be okay, though.  This kind of comes down to
wanting a complete set of system drivers, i.e. kexec/kdump again.

	-hpa

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, back to index

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-25 13:47 [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic Ahmed S. Darwish
2011-01-25 13:51 ` [PATCH -next 1/2][RFC] x86: Saveoops: Switch to real-mode and call BIOS Ahmed S. Darwish
2011-01-25 17:26   ` H. Peter Anvin
2011-01-25 13:53 ` [PATCH -next 2/2][RFC] x86: Saveoops: Reserve low memory and register code Ahmed S. Darwish
2011-01-25 17:29   ` H. Peter Anvin
2011-01-26  9:04     ` Ahmed S. Darwish
2011-01-25 14:09 ` [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic Ingo Molnar
2011-01-25 15:08   ` Tejun Heo
2011-01-25 17:33     ` H. Peter Anvin
2011-01-26 11:44       ` Ahmed S. Darwish
2011-02-03 14:36     ` Pavel Machek
2011-02-03 15:28       ` H. Peter Anvin
2011-02-03 17:57         ` Ingo Molnar
2011-02-03 21:07           ` H. Peter Anvin
2011-01-25 15:36   ` Ahmed S. Darwish
2011-01-25 16:02     ` James Bottomley
2011-01-25 17:05       ` Ahmed S. Darwish
2011-01-25 17:20         ` James Bottomley
2011-01-25 22:10         ` Mark Lord
2011-01-25 22:16           ` Randy Dunlap
2011-01-25 22:45             ` Jeff Garzik
2011-01-25 22:58               ` H. Peter Anvin
2011-01-26  0:26                 ` Jeff Garzik
2011-01-31  2:59                 ` Rusty Russell
2011-01-31 10:45                   ` Ingo Molnar
2011-01-25 17:32     ` Tony Luck
2011-01-25 17:36       ` H. Peter Anvin
2011-01-25 19:04       ` Jeff Garzik
2011-01-25 14:49 ` Tejun Heo
2011-01-28  7:59   ` Jan Ceuleers
2011-01-25 20:25 ` Linus Torvalds
     [not found]   ` <20110126124954.GC24527@laptop>
2011-01-26 23:07     ` Luck, Tony
     [not found]       ` <20110126231620.GA14807@redhat.com>
     [not found]         ` <987664A83D2D224EAE907B061CE93D53019438EB02@orsmsx505.amr.corp.intel.com>
     [not found]           ` <20110126233033.GB14807@redhat.com>
     [not found]             ` <987664A83D2D224EAE907B061CE93D53019438EBB6@orsmsx505.amr.corp.intel.com>
     [not found]               ` <4D40F7F1.3020509@zytor.com>
     [not found]                 ` <20110127120039.GD20279@elte.hu>
2011-01-27 18:35                   ` Luck, Tony
     [not found]                   ` <4D4197CB.9070201@zytor.com>
     [not found]                     ` <20110127162429.GB26437@elte.hu>
2011-01-27 18:56                       ` Luck, Tony
     [not found]     ` <20110127021338.GA20334@redhat.com>
     [not found]       ` <4D40F81E.1030009@zytor.com>
     [not found]         ` <20110127052639.GA16289@laptop>
     [not found]           ` <m1sjweyeax.fsf@fess.ebiederm.org>
2011-02-02 11:13             ` Ahmed S. Darwish

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lkml.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lkml.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lkml.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lkml.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lkml.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lkml.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lkml.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lkml.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lkml.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lkml.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lkml.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lkml.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git