LKML Archive on
help / color / mirror / Atom feed
From: "Michael Kerrisk (man-pages)" <>
To: Vivek Goyal <>
Cc:, lkml <>,
	"" <>,, Andy Lutomirski <>,
	Dave Young <>, "H. Peter Anvin" <>,
	Borislav Petkov <>,
	"Eric W. Biederman" <>
Subject: Re: Edited kexec_load(2) [kexec_file_load()] man page for review
Date: Fri, 16 Jan 2015 14:30:25 +0100	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Hello Vivek,

Thanks for your comments! I've added some further text to
the page based on those comments. See some follow-up 
questions below.

On 01/12/2015 11:16 PM, Vivek Goyal wrote:
> On Wed, Jan 07, 2015 at 10:17:56PM +0100, Michael Kerrisk (man-pages) wrote:
> [..]
>>>> .BR KEXEC_ON_CRASH " (since Linux 2.6.13)"
>>>> Execute the new kernel automatically on a system crash.
>>>> .\" FIXME Explain in more detail how KEXEC_ON_CRASH is actually used
>> I wasn't expecting that you would respond to the FIXMEs that were 
>> not labeled "kexec_file_load", but I was hoping you might ;-). Thanks!
>> I have a few additional questions to your nice notes.
>>> Upon boot first kernel reserves a chunk of contiguous memory (if
>>> crashkernel=<> command line paramter is passed). This memory is
>>> is used to load the crash kernel (Kernel which will be booted into
>>> if first kernel crashes).
> Hi Michael,
>> Can I just confirm: is it in all cases only possible to use kexec_load() 
>> and kexec_file_load() if the kernel was booted with the 'crashkernel'
>> parameter set?
> As of now, only kexec_load() and kexec_file_load() system calls can
> make use of memory reserved by crashkernel=<> kernel parameter. And
> this is used only if we are trying to load a crash kernel (KEXEC_ON_CRASH
> flag specified).


>>> Location of this reserved memory is exported to user space through
>>> /proc/iomem file. 
>> Is that export via an entry labeled "Crash kernel" in the 
>> /proc/iomem file?
> Yes.

Okay -- thanks.

>>> User space can parse it and prepare list of segments
>>> specifying this reserved memory as destination.
>> I'm not quite clear on "specifying this reserved memory as destination".
>> Is that done by specifying the address in the kexec_segment.mem fields?
> You are absolutely right. User space can specify in kexec_segment.mem
> field the memory location where it expecting a particular segment to
> be loaded by kernel.
>>> Once kernel sees the flag KEXEC_ON_CRASH, it makes sure that all the
>>> segments are destined for reserved memory otherwise kernel load operation
>>> fails.
>> Could you point me to where this checking is done? Also, what is the
>> error (errno) that occurs when the load operation fails? (I think the
>> answers to these questions are "at the start of kimage_alloc_init()"
>> and "EADDRNOTAVAIL", but I'd like to confirm.)
> This checking happens in sanity_check_segment_list() which is called
> by kimage_alloc_init().
> And yes, error code returned is -EADDRNOTAVAIL.

Thanks. I added EADDRNOTAVAIL to the ERRORS.

>>> [..]
>>>> struct kexec_segment {
>>>>     void   *buf;        /* Buffer in user space */
>>>>     size_t  bufsz;      /* Buffer length in user space */
>>>>     void   *mem;        /* Physical address of kernel */
>>>>     size_t  memsz;      /* Physical address length */
>>>> };
>>>> .fi
>>>> .in
>>>> .PP
>>>> .\" FIXME Explain the details of how the kernel image defined by segments
>>>> .\" is copied from the calling process into previously reserved memory.
>>> Kernel image defined by segments is copied into kernel either in regular
>>> memory 
>> Could you clarify what you mean by "regular memory"?
> I meant memory which is not reserved memory.


>>> or in reserved memory (if KEXEC_ON_CRASH is set). Kernel first
>>> copies list of segments in kernel memory and then goes does various
>>> sanity checks on the segments. If everything looks line, kernel copies
>>> segment data to kernel memory.
>>> In case of normal kexec, segment data is loaded in any available memory
>>> and segment data is moved to final destination at the kexec reboot time.
>> By "moved to final destination", do you mean "moved from user space to the
>> final kernel-space destination"?
> No. Segment data moves from user space to kernel space once kexec_load()
> call finishes successfully. But when user does reboot (kexec -e), at that
> time kernel moves that segment data to its final location. Kernel could
> not place the segment at its final location during kexec_load() time as
> that memory is already in use by running kernel. But once we are about
> to reboot to new kernel, we can overwrite the old kernel's memory.

Got it.

>>> In case of kexec on panic (KEXEC_ON_CRASH flag set), segment data is
>>> directly loaded to reserved memory and after crash kexec simply jumps
>> By "directly", I assume you mean "at the time of the kexec_laod() call",
>> right?
> Yes.


So, returning to the kexeec_segment structure:

           struct kexec_segment {
               void   *buf;        /* Buffer in user space */
               size_t  bufsz;      /* Buffer length in user space */
               void   *mem;        /* Physical address of kernel */
               size_t  memsz;      /* Physical address length */

Are the following statements correct:
* buf + bufsz identify a memory region in the caller's virtual 
  address space that is the source of the copy
* mem + memsz specify the target memory region of the copy
* mem is  physical memory address, as seen from kernel space
* the number of bytes copied from userspace is min(bufsz, memsz)
* if bufsz > memsz, then excess bytes in the user-space buffer 
  are ignored.
* if memsz > bufsz, then excess bytes in the target kernel buffer
  are filled with zeros.

Also, it seems to me that 'mem' need not be page aligned.
Is that correct? Should the man page say something about that?
(E.g., is it generally desirable that 'mem' should be page aligned?)

Likewise, 'memsz' doesn't need to be a page multiple, IIUC.
Should the man page say anything about this? For example, should 
it note that the initialized kernel segment will be of size:

     (mem % PAGE_SIZE + memsz) rounded up to the next multiple of PAGE_SIZE

And should it note that if 'mem' is not a multiple of the page size, then
the initial bytes (mem % PAGE_SIZE)) in the first page of the kernel segment 
will be zeros?

(Hopefully I have read kimage_load_normal_segment() correctly.)

And one further question. Other than the fact that they are used with 
different system calls, what is the difference between KEXEC_ON_CRASH 



Michael Kerrisk
Linux man-pages maintainer;
Linux/UNIX System Programming Training:

  reply	other threads:[~2015-01-16 13:30 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-09 19:17 Michael Kerrisk (man-pages)
2014-11-11 21:30 ` Vivek Goyal
2015-01-07 21:17   ` Michael Kerrisk (man-pages)
2015-01-12 22:16     ` Vivek Goyal
2015-01-16 13:30       ` Michael Kerrisk (man-pages) [this message]
2015-01-27  8:07         ` Michael Kerrisk (man-pages)
2015-01-27 14:24         ` Vivek Goyal
2015-01-28  8:04           ` Michael Kerrisk (man-pages)
2015-01-28 14:48             ` Vivek Goyal
2015-01-28 15:49               ` Michael Kerrisk (man-pages)
2015-01-28 20:34                 ` Vivek Goyal
2015-01-28 21:14                   ` Scot Doyle
2015-01-28 21:31                     ` Vivek Goyal
2015-01-28 22:10                       ` Scot Doyle
2015-01-28 22:25                         ` Vivek Goyal
2015-01-29  1:27                           ` Scot Doyle
2015-01-29  5:39                             ` Michael Kerrisk (man-pages)
2015-01-29 16:06                               ` Scot Doyle
2015-01-30 15:25                                 ` Michael Kerrisk (man-pages)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \
    --subject='Re: Edited kexec_load(2) [kexec_file_load()] man page for review' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).