LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [BUG] Kprobes fails on 2.6.25-rc3-mm1 (x86) systems, if CONFIG_DEBUG_RODATA is set.
@ 2008-03-06 12:41 Srinivasa DS
  2008-03-06 12:55 ` Mathieu Desnoyers
  0 siblings, 1 reply; 11+ messages in thread
From: Srinivasa DS @ 2008-03-06 12:41 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, Mathieu Desnoyers
  Cc: ananth, Jim Keniston, srikar, SystemTAP

Insertion of kprobe module causes oops on 2.6.25-rc3-mm1 (x86) systems if we 
enable CONFIG_DEBUG_RODATA.  While registering the probe, memcpy() fails to 
copy breakpoint instruction to the instruction address and generates the 
oops.

BUG: unable to handle kernel paging request at ffffffff8047d1a7
IP: [<ffffffff8047ff62>] text_poke+0xa/0x10
PGD 203067 PUD 207063 PMD 7e191163 PTE 47d161
Oops: 0003 [1] PREEMPT SMP
................................................

This is because, Mathieu's patch (http://lkml.org/lkml/2008/2/2/226) makes 
entire text segment as READONLY, if we enable CONFIG_DEBUG_RODATA. 

So reverting this patch or new patch, which considers kprobes while deciding  
boundary for setting readonly pages might solve the probelm.

Thanks
 Srinivasa DS
 LTC-IBM
 




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] Kprobes fails on 2.6.25-rc3-mm1 (x86) systems, if CONFIG_DEBUG_RODATA is set.
  2008-03-06 12:41 [BUG] Kprobes fails on 2.6.25-rc3-mm1 (x86) systems, if CONFIG_DEBUG_RODATA is set Srinivasa DS
@ 2008-03-06 12:55 ` Mathieu Desnoyers
  2008-03-06 13:33   ` Srinivasa DS
  0 siblings, 1 reply; 11+ messages in thread
From: Mathieu Desnoyers @ 2008-03-06 12:55 UTC (permalink / raw)
  To: Srinivasa DS
  Cc: Andrew Morton, linux-kernel, ananth, Jim Keniston, srikar, SystemTAP

* Srinivasa DS (srinivasa@in.ibm.com) wrote:
> Insertion of kprobe module causes oops on 2.6.25-rc3-mm1 (x86) systems if we 
> enable CONFIG_DEBUG_RODATA.  While registering the probe, memcpy() fails to 
> copy breakpoint instruction to the instruction address and generates the 
> oops.
> 
> BUG: unable to handle kernel paging request at ffffffff8047d1a7
> IP: [<ffffffff8047ff62>] text_poke+0xa/0x10
> PGD 203067 PUD 207063 PMD 7e191163 PTE 47d161
> Oops: 0003 [1] PREEMPT SMP
> ................................................
> 
> This is because, Mathieu's patch (http://lkml.org/lkml/2008/2/2/226) makes 
> entire text segment as READONLY, if we enable CONFIG_DEBUG_RODATA. 
> 
> So reverting this patch or new patch, which considers kprobes while deciding  
> boundary for setting readonly pages might solve the probelm.
> 

Hi,

Are you running under Xen, lguest, kvm.. ?

Mathieu

> Thanks
>  Srinivasa DS
>  LTC-IBM
>  
> 
> 
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] Kprobes fails on 2.6.25-rc3-mm1 (x86) systems, if CONFIG_DEBUG_RODATA is set.
  2008-03-06 12:55 ` Mathieu Desnoyers
@ 2008-03-06 13:33   ` Srinivasa DS
  2008-03-06 13:48     ` [PATCH] x86 - Enhance DEBUG_RODATA support - alternatives Mathieu Desnoyers
  0 siblings, 1 reply; 11+ messages in thread
From: Srinivasa DS @ 2008-03-06 13:33 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andrew Morton, linux-kernel, ananth, Jim Keniston, srikar, SystemTAP

On Thursday 06 March 2008 06:25:51 pm Mathieu Desnoyers wrote:
> * Srinivasa DS (srinivasa@in.ibm.com) wrote:
> > Insertion of kprobe module causes oops on 2.6.25-rc3-mm1 (x86) systems if
> > we enable CONFIG_DEBUG_RODATA.  While registering the probe, memcpy()
> > fails to copy breakpoint instruction to the instruction address and
> > generates the oops.
> >
> > BUG: unable to handle kernel paging request at ffffffff8047d1a7
> > IP: [<ffffffff8047ff62>] text_poke+0xa/0x10
> > PGD 203067 PUD 207063 PMD 7e191163 PTE 47d161
> > Oops: 0003 [1] PREEMPT SMP
> > ................................................
> >
> > This is because, Mathieu's patch (http://lkml.org/lkml/2008/2/2/226)
> > makes entire text segment as READONLY, if we enable CONFIG_DEBUG_RODATA.
> >
> > So reverting this patch or new patch, which considers kprobes while
> > deciding boundary for setting readonly pages might solve the probelm.
>
> Hi,
>
> Are you running under Xen, lguest, kvm.. ?

No, on a bare x86_64 system.   That shouldn't matter for this bug, A quick 
look at the call path shows me this.
     start_kernel()->rest_init()->kernel_init()->init_post()->mark_rodata_ro()

Srinivasa DS
>
> Mathieu
>
> > Thanks
> >  Srinivasa DS
> >  LTC-IBM



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] x86 - Enhance DEBUG_RODATA support - alternatives
  2008-03-06 13:33   ` Srinivasa DS
@ 2008-03-06 13:48     ` Mathieu Desnoyers
  2008-03-06 14:01       ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Mathieu Desnoyers @ 2008-03-06 13:48 UTC (permalink / raw)
  To: Srinivasa DS
  Cc: Andrew Morton, linux-kernel, ananth, Jim Keniston, srikar,
	SystemTAP, Ingo Molnar, Andi Kleen, pageexec, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Jeremy Fitzhardinge

* Srinivasa DS (srinivasa@in.ibm.com) wrote:
> On Thursday 06 March 2008 06:25:51 pm Mathieu Desnoyers wrote:
> > * Srinivasa DS (srinivasa@in.ibm.com) wrote:
> > > Insertion of kprobe module causes oops on 2.6.25-rc3-mm1 (x86) systems if
> > > we enable CONFIG_DEBUG_RODATA.  While registering the probe, memcpy()
> > > fails to copy breakpoint instruction to the instruction address and
> > > generates the oops.
> > >
> > > BUG: unable to handle kernel paging request at ffffffff8047d1a7
> > > IP: [<ffffffff8047ff62>] text_poke+0xa/0x10
> > > PGD 203067 PUD 207063 PMD 7e191163 PTE 47d161
> > > Oops: 0003 [1] PREEMPT SMP
> > > ................................................
> > >
> > > This is because, Mathieu's patch (http://lkml.org/lkml/2008/2/2/226)
> > > makes entire text segment as READONLY, if we enable CONFIG_DEBUG_RODATA.
> > >
> > > So reverting this patch or new patch, which considers kprobes while
> > > deciding boundary for setting readonly pages might solve the probelm.
> >
> > Hi,
> >
> > Are you running under Xen, lguest, kvm.. ?
> 
> No, on a bare x86_64 system.   That shouldn't matter for this bug, A quick 
> look at the call path shows me this.
>      start_kernel()->rest_init()->kernel_init()->init_post()->mark_rodata_ro()
> 

Hrm, yeah, that's because http://lkml.org/lkml/2008/2/2/227

x86 - Enhance DEBUG_RODATA support - alternatives

has been pulled out of the x86 tree. It implements the correct text_poke
required to support this. It's been removed because Xen does not support
the WP bit in the guest kernels.

Can you try my new replacement implementation ? It creates another
mapping instead of modifying the WP bit.


x86 - Enhance DEBUG_RODATA support - alternatives

Fix a memcpy that should be a text_poke (in apply_alternatives).

Use kernel_wp_save/kernel_wp_restore in text_poke to support DEBUG_RODATA
correctly and so the CPU HOTPLUG special case can be removed.

Add text_poke_early, for alternatives and paravirt boot-time and module load
time patching.

Changelog:

- Fix text_set and text_poke alignment check (mixed up bitwise and and or)
- Remove text_set
- Export add_nops, so it can be used by others.
- Document text_poke_early.
- Remove clflush, since it breaks some VIA architectures and is not strictly
  necessary.
- Add kerneldoc to text_poke and text_poke_early.
- Create a second vmap instead of using the WP bit to support Xen and VMI.
- Move local_irq disable within text_poke and text_poke_early to be able to
  be sleepable in these functions.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: pageexec@freemail.hu
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
---
 arch/x86/kernel/alternative.c |   88 +++++++++++++++++++++++++++++++-----------
 include/asm-x86/alternative.h |   23 ++++++++++
 2 files changed, 87 insertions(+), 24 deletions(-)

Index: linux-2.6-lttng/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/alternative.c	2008-03-03 12:29:02.000000000 -0500
+++ linux-2.6-lttng/arch/x86/kernel/alternative.c	2008-03-03 23:59:33.000000000 -0500
@@ -11,6 +11,8 @@
 #include <asm/mce.h>
 #include <asm/nmi.h>
 #include <asm/vsyscall.h>
+#include <asm/cacheflush.h>
+#include <asm/io.h>
 
 #define MAX_PATCH_LEN (255-1)
 
@@ -173,7 +175,7 @@ static const unsigned char*const * find_
 #endif /* CONFIG_X86_64 */
 
 /* Use this to add nops to a buffer, then text_poke the whole buffer. */
-static void add_nops(void *insns, unsigned int len)
+void add_nops(void *insns, unsigned int len)
 {
 	const unsigned char *const *noptable = find_nop_table();
 
@@ -186,6 +188,7 @@ static void add_nops(void *insns, unsign
 		len -= noplen;
 	}
 }
+EXPORT_SYMBOL_GPL(add_nops);
 
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 extern u8 *__smp_locks[], *__smp_locks_end[];
@@ -219,7 +222,7 @@ void apply_alternatives(struct alt_instr
 		memcpy(insnbuf, a->replacement, a->replacementlen);
 		add_nops(insnbuf + a->replacementlen,
 			 a->instrlen - a->replacementlen);
-		text_poke(instr, insnbuf, a->instrlen);
+		text_poke_early(instr, insnbuf, a->instrlen);
 	}
 }
 
@@ -280,7 +283,6 @@ void alternatives_smp_module_add(struct 
 				 void *text,  void *text_end)
 {
 	struct smp_alt_module *smp;
-	unsigned long flags;
 
 	if (noreplace_smp)
 		return;
@@ -306,39 +308,37 @@ void alternatives_smp_module_add(struct 
 		__FUNCTION__, smp->locks, smp->locks_end,
 		smp->text, smp->text_end, smp->name);
 
-	spin_lock_irqsave(&smp_alt, flags);
+	spin_lock(&smp_alt);
 	list_add_tail(&smp->next, &smp_alt_modules);
 	if (boot_cpu_has(X86_FEATURE_UP))
 		alternatives_smp_unlock(smp->locks, smp->locks_end,
 					smp->text, smp->text_end);
-	spin_unlock_irqrestore(&smp_alt, flags);
+	spin_unlock(&smp_alt);
 }
 
 void alternatives_smp_module_del(struct module *mod)
 {
 	struct smp_alt_module *item;
-	unsigned long flags;
 
 	if (smp_alt_once || noreplace_smp)
 		return;
 
-	spin_lock_irqsave(&smp_alt, flags);
+	spin_lock(&smp_alt);
 	list_for_each_entry(item, &smp_alt_modules, next) {
 		if (mod != item->mod)
 			continue;
 		list_del(&item->next);
-		spin_unlock_irqrestore(&smp_alt, flags);
+		spin_unlock(&smp_alt);
 		DPRINTK("%s: %s\n", __FUNCTION__, item->name);
 		kfree(item);
 		return;
 	}
-	spin_unlock_irqrestore(&smp_alt, flags);
+	spin_unlock(&smp_alt);
 }
 
 void alternatives_smp_switch(int smp)
 {
 	struct smp_alt_module *mod;
-	unsigned long flags;
 
 #ifdef CONFIG_LOCKDEP
 	/*
@@ -355,7 +355,7 @@ void alternatives_smp_switch(int smp)
 		return;
 	BUG_ON(!smp && (num_online_cpus() > 1));
 
-	spin_lock_irqsave(&smp_alt, flags);
+	spin_lock(&smp_alt);
 
 	/*
 	 * Avoid unnecessary switches because it forces JIT based VMs to
@@ -379,7 +379,7 @@ void alternatives_smp_switch(int smp)
 						mod->text, mod->text_end);
 	}
 	smp_mode = smp;
-	spin_unlock_irqrestore(&smp_alt, flags);
+	spin_unlock(&smp_alt);
 }
 
 #endif
@@ -407,7 +407,7 @@ void apply_paravirt(struct paravirt_patc
 
 		/* Pad the rest with nops */
 		add_nops(insnbuf + used, p->len - used);
-		text_poke(p->instr, insnbuf, p->len);
+		text_poke_early(p->instr, insnbuf, p->len);
 	}
 }
 extern struct paravirt_patch_site __start_parainstructions[],
@@ -416,8 +416,6 @@ extern struct paravirt_patch_site __star
 
 void __init alternative_instructions(void)
 {
-	unsigned long flags;
-
 	/* The patching is not fully atomic, so try to avoid local interruptions
 	   that might execute the to be patched code.
 	   Other CPUs are not running. */
@@ -426,7 +424,6 @@ void __init alternative_instructions(voi
 	stop_mce();
 #endif
 
-	local_irq_save(flags);
 	apply_alternatives(__alt_instructions, __alt_instructions_end);
 
 	/* switch to patch-once-at-boottime-only mode and free the
@@ -458,7 +455,6 @@ void __init alternative_instructions(voi
 	}
 #endif
  	apply_paravirt(__parainstructions, __parainstructions_end);
-	local_irq_restore(flags);
 
 	if (smp_alt_once)
 		free_init_pages("SMP alternatives",
@@ -471,18 +467,64 @@ void __init alternative_instructions(voi
 #endif
 }
 
-/*
- * Warning:
+/**
+ * text_poke_early - Update instructions on a live kernel at boot time
+ * @addr: address to modify
+ * @opcode: source of the copy
+ * @len: length to copy
+ *
  * When you use this code to patch more than one byte of an instruction
  * you need to make sure that other CPUs cannot execute this code in parallel.
- * Also no thread must be currently preempted in the middle of these instructions.
- * And on the local CPU you need to be protected again NMI or MCE handlers
- * seeing an inconsistent instruction while you patch.
+ * Also no thread must be currently preempted in the middle of these
+ * instructions. And on the local CPU you need to be protected again NMI or MCE
+ * handlers seeing an inconsistent instruction while you patch.
  */
-void __kprobes text_poke(void *addr, unsigned char *opcode, int len)
+void *text_poke_early(void *addr, const void *opcode, size_t len)
 {
+	unsigned long flags;
+	local_irq_save(flags);
 	memcpy(addr, opcode, len);
+	local_irq_restore(flags);
+	sync_core();
+	/* Could also do a CLFLUSH here to speed up CPU recovery; but
+	   that causes hangs on some VIA CPUs. */
+	return addr;
+}
+
+/**
+ * text_poke - Update instructions on a live kernel
+ * @addr: address to modify
+ * @opcode: source of the copy
+ * @len: length to copy
+ *
+ * Only atomic text poke/set should be allowed when not doing early patching.
+ * It means the size must be writable atomically and the address must be aligned
+ * in a way that permits an atomic write. It also makes sure we fit on a single
+ * page.
+ */
+void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
+{
+	unsigned long flags;
+	char *vaddr;
+	int nr_pages = 2;
+
+	BUG_ON(len > sizeof(long));
+	BUG_ON((((long)addr + len - 1) & ~(sizeof(long) - 1))
+		- ((long)addr & ~(sizeof(long) - 1)));
+	{
+		struct page *pages[2] = { virt_to_page(addr),
+			virt_to_page(addr + PAGE_SIZE) };
+		if (!pages[1])
+			nr_pages = 1;
+		vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
+		WARN_ON(!vaddr);
+		local_irq_save(flags);
+		memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
+		local_irq_restore(flags);
+		vunmap(vaddr);
+	}
 	sync_core();
 	/* Could also do a CLFLUSH here to speed up CPU recovery; but
 	   that causes hangs on some VIA CPUs. */
+	return addr;
 }
Index: linux-2.6-lttng/include/asm-x86/alternative.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-x86/alternative.h	2008-03-03 12:29:02.000000000 -0500
+++ linux-2.6-lttng/include/asm-x86/alternative.h	2008-03-03 12:29:53.000000000 -0500
@@ -156,6 +156,27 @@ apply_paravirt(struct paravirt_patch_sit
 #define __parainstructions_end	NULL
 #endif
 
-extern void text_poke(void *addr, unsigned char *opcode, int len);
+extern void add_nops(void *insns, unsigned int len);
+
+/*
+ * Clear and restore the kernel write-protection flag on the local CPU.
+ * Allows the kernel to edit read-only pages.
+ * Side-effect: any interrupt handler running between save and restore will have
+ * the ability to write to read-only pages.
+ *
+ * Warning:
+ * Code patching in the UP case is safe if NMIs and MCE handlers are stopped and
+ * no thread can be preempted in the instructions being modified (no iret to an
+ * invalid instruction possible) or if the instructions are changed from a
+ * consistent state to another consistent state atomically.
+ * More care must be taken when modifying code in the SMP case because of
+ * Intel's errata.
+ * On the local CPU you need to be protected again NMI or MCE handlers seeing an
+ * inconsistent instruction while you patch.
+ * The _early version expects the memory to already be RW.
+ */
+
+extern void *text_poke(void *addr, const void *opcode, size_t len);
+extern void *text_poke_early(void *addr, const void *opcode, size_t len);
 
 #endif /* _ASM_X86_ALTERNATIVE_H */


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86 - Enhance DEBUG_RODATA support - alternatives
  2008-03-06 14:01       ` Ingo Molnar
@ 2008-03-06 13:59         ` pageexec
  2008-03-06 15:52           ` Ingo Molnar
                             ` (2 more replies)
  2008-03-06 14:18         ` Mathieu Desnoyers
  1 sibling, 3 replies; 11+ messages in thread
From: pageexec @ 2008-03-06 13:59 UTC (permalink / raw)
  To: Mathieu Desnoyers, Ingo Molnar
  Cc: Srinivasa DS, Andrew Morton, linux-kernel, ananth, Jim Keniston,
	srikar, SystemTAP, Andi Kleen, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Jeremy Fitzhardinge, Arjan van de Ven

On 6 Mar 2008 at 15:01, Ingo Molnar wrote:

> +/**
> + * text_poke - Update instructions on a live kernel
> + * @addr: address to modify
> + * @opcode: source of the copy
> + * @len: length to copy
> + *
> + * Only atomic text poke/set should be allowed when not doing early patching.
> + * It means the size must be writable atomically and the address must be aligned
> + * in a way that permits an atomic write. It also makes sure we fit on a single
> + * page.
> + */
> +void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
> +{
> +	unsigned long flags;
> +	char *vaddr;
> +	int nr_pages = 2;
> +
> +	BUG_ON(len > sizeof(long));
> +	BUG_ON((((long)addr + len - 1) & ~(sizeof(long) - 1))
> +		- ((long)addr & ~(sizeof(long) - 1)));
> +	{
> +		struct page *pages[2] = { virt_to_page(addr),
> +			virt_to_page(addr + PAGE_SIZE) };
> +		if (!pages[1])
> +			nr_pages = 1;
> +		vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> +		WARN_ON(!vaddr);
> +		local_irq_save(flags);
> +		memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> +		local_irq_restore(flags);

you probably want a BUG_ON instead (or some graceful recovery) else
the NULL deref will trigger with IRQs off...

> +		vunmap(vaddr);
> +	}
>  	sync_core();
>  	/* Could also do a CLFLUSH here to speed up CPU recovery; but
>  	   that causes hangs on some VIA CPUs. */
> +	return addr;
>  }



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86 - Enhance DEBUG_RODATA support - alternatives
  2008-03-06 13:48     ` [PATCH] x86 - Enhance DEBUG_RODATA support - alternatives Mathieu Desnoyers
@ 2008-03-06 14:01       ` Ingo Molnar
  2008-03-06 13:59         ` pageexec
  2008-03-06 14:18         ` Mathieu Desnoyers
  0 siblings, 2 replies; 11+ messages in thread
From: Ingo Molnar @ 2008-03-06 14:01 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Srinivasa DS, Andrew Morton, linux-kernel, ananth, Jim Keniston,
	srikar, SystemTAP, Andi Kleen, pageexec, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Jeremy Fitzhardinge,
	Arjan van de Ven


[ Cc:-ed Arjan - he wrote and handles the DEBUG_RODATA bits. ]

* Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> Hrm, yeah, that's because http://lkml.org/lkml/2008/2/2/227
> 
> x86 - Enhance DEBUG_RODATA support - alternatives
> 
> has been pulled out of the x86 tree. It implements the correct 
> text_poke required to support this. It's been removed because Xen does 
> not support the WP bit in the guest kernels.
> 
> Can you try my new replacement implementation ? It creates another 
> mapping instead of modifying the WP bit.

thanks, applied it to x86.git.

note, there were rejects and i had to hand-merge it - in general it's 
best to send development x86 patches against:

    http://people.redhat.com/mingo/x86.git/README

find the merged patch is below - i hope i merged it right ;)

	Ingo

------------->
Subject: x86: enhance DEBUG_RODATA support - alternatives
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Date: Thu, 6 Mar 2008 08:48:49 -0500

Fix a memcpy that should be a text_poke (in apply_alternatives).

Use kernel_wp_save/kernel_wp_restore in text_poke to support DEBUG_RODATA
correctly and so the CPU HOTPLUG special case can be removed.

Add text_poke_early, for alternatives and paravirt boot-time and module load
time patching.

Changelog:

- Fix text_set and text_poke alignment check (mixed up bitwise and and or)
- Remove text_set
- Export add_nops, so it can be used by others.
- Document text_poke_early.
- Remove clflush, since it breaks some VIA architectures and is not strictly
  necessary.
- Add kerneldoc to text_poke and text_poke_early.
- Create a second vmap instead of using the WP bit to support Xen and VMI.
- Move local_irq disable within text_poke and text_poke_early to be able to
  be sleepable in these functions.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: pageexec@freemail.hu
CC: H. Peter Anvin <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/alternative.c |   88 +++++++++++++++++++++++++++++++-----------
 include/asm-x86/alternative.h |   23 ++++++++++
 2 files changed, 87 insertions(+), 24 deletions(-)

Index: linux-x86.q/arch/x86/kernel/alternative.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/alternative.c
+++ linux-x86.q/arch/x86/kernel/alternative.c
@@ -11,6 +11,8 @@
 #include <asm/mce.h>
 #include <asm/nmi.h>
 #include <asm/vsyscall.h>
+#include <asm/cacheflush.h>
+#include <asm/io.h>
 
 #define MAX_PATCH_LEN (255-1)
 
@@ -173,7 +175,7 @@ static const unsigned char*const * find_
 #endif /* CONFIG_X86_64 */
 
 /* Use this to add nops to a buffer, then text_poke the whole buffer. */
-static void add_nops(void *insns, unsigned int len)
+void add_nops(void *insns, unsigned int len)
 {
 	const unsigned char *const *noptable = find_nop_table();
 
@@ -186,6 +188,7 @@ static void add_nops(void *insns, unsign
 		len -= noplen;
 	}
 }
+EXPORT_SYMBOL_GPL(add_nops);
 
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 extern u8 *__smp_locks[], *__smp_locks_end[];
@@ -219,7 +222,7 @@ void apply_alternatives(struct alt_instr
 		memcpy(insnbuf, a->replacement, a->replacementlen);
 		add_nops(insnbuf + a->replacementlen,
 			 a->instrlen - a->replacementlen);
-		text_poke(instr, insnbuf, a->instrlen);
+		text_poke_early(instr, insnbuf, a->instrlen);
 	}
 }
 
@@ -280,7 +283,6 @@ void alternatives_smp_module_add(struct 
 				 void *text,  void *text_end)
 {
 	struct smp_alt_module *smp;
-	unsigned long flags;
 
 	if (noreplace_smp)
 		return;
@@ -306,39 +308,37 @@ void alternatives_smp_module_add(struct 
 		__func__, smp->locks, smp->locks_end,
 		smp->text, smp->text_end, smp->name);
 
-	spin_lock_irqsave(&smp_alt, flags);
+	spin_lock(&smp_alt);
 	list_add_tail(&smp->next, &smp_alt_modules);
 	if (boot_cpu_has(X86_FEATURE_UP))
 		alternatives_smp_unlock(smp->locks, smp->locks_end,
 					smp->text, smp->text_end);
-	spin_unlock_irqrestore(&smp_alt, flags);
+	spin_unlock(&smp_alt);
 }
 
 void alternatives_smp_module_del(struct module *mod)
 {
 	struct smp_alt_module *item;
-	unsigned long flags;
 
 	if (smp_alt_once || noreplace_smp)
 		return;
 
-	spin_lock_irqsave(&smp_alt, flags);
+	spin_lock(&smp_alt);
 	list_for_each_entry(item, &smp_alt_modules, next) {
 		if (mod != item->mod)
 			continue;
 		list_del(&item->next);
-		spin_unlock_irqrestore(&smp_alt, flags);
+		spin_unlock(&smp_alt);
 		DPRINTK("%s: %s\n", __func__, item->name);
 		kfree(item);
 		return;
 	}
-	spin_unlock_irqrestore(&smp_alt, flags);
+	spin_unlock(&smp_alt);
 }
 
 void alternatives_smp_switch(int smp)
 {
 	struct smp_alt_module *mod;
-	unsigned long flags;
 
 #ifdef CONFIG_LOCKDEP
 	/*
@@ -355,7 +355,7 @@ void alternatives_smp_switch(int smp)
 		return;
 	BUG_ON(!smp && (num_online_cpus() > 1));
 
-	spin_lock_irqsave(&smp_alt, flags);
+	spin_lock(&smp_alt);
 
 	/*
 	 * Avoid unnecessary switches because it forces JIT based VMs to
@@ -379,7 +379,7 @@ void alternatives_smp_switch(int smp)
 						mod->text, mod->text_end);
 	}
 	smp_mode = smp;
-	spin_unlock_irqrestore(&smp_alt, flags);
+	spin_unlock(&smp_alt);
 }
 
 #endif
@@ -407,7 +407,7 @@ void apply_paravirt(struct paravirt_patc
 
 		/* Pad the rest with nops */
 		add_nops(insnbuf + used, p->len - used);
-		text_poke(p->instr, insnbuf, p->len);
+		text_poke_early(p->instr, insnbuf, p->len);
 	}
 }
 extern struct paravirt_patch_site __start_parainstructions[],
@@ -416,8 +416,6 @@ extern struct paravirt_patch_site __star
 
 void __init alternative_instructions(void)
 {
-	unsigned long flags;
-
 	/* The patching is not fully atomic, so try to avoid local interruptions
 	   that might execute the to be patched code.
 	   Other CPUs are not running. */
@@ -426,7 +424,6 @@ void __init alternative_instructions(voi
 	stop_mce();
 #endif
 
-	local_irq_save(flags);
 	apply_alternatives(__alt_instructions, __alt_instructions_end);
 
 	/* switch to patch-once-at-boottime-only mode and free the
@@ -458,7 +455,6 @@ void __init alternative_instructions(voi
 	}
 #endif
  	apply_paravirt(__parainstructions, __parainstructions_end);
-	local_irq_restore(flags);
 
 	if (smp_alt_once)
 		free_init_pages("SMP alternatives",
@@ -471,18 +467,64 @@ void __init alternative_instructions(voi
 #endif
 }
 
-/*
- * Warning:
+/**
+ * text_poke_early - Update instructions on a live kernel at boot time
+ * @addr: address to modify
+ * @opcode: source of the copy
+ * @len: length to copy
+ *
  * When you use this code to patch more than one byte of an instruction
  * you need to make sure that other CPUs cannot execute this code in parallel.
- * Also no thread must be currently preempted in the middle of these instructions.
- * And on the local CPU you need to be protected again NMI or MCE handlers
- * seeing an inconsistent instruction while you patch.
+ * Also no thread must be currently preempted in the middle of these
+ * instructions. And on the local CPU you need to be protected again NMI or MCE
+ * handlers seeing an inconsistent instruction while you patch.
  */
-void __kprobes text_poke(void *addr, unsigned char *opcode, int len)
+void *text_poke_early(void *addr, const void *opcode, size_t len)
 {
+	unsigned long flags;
+	local_irq_save(flags);
 	memcpy(addr, opcode, len);
+	local_irq_restore(flags);
+	sync_core();
+	/* Could also do a CLFLUSH here to speed up CPU recovery; but
+	   that causes hangs on some VIA CPUs. */
+	return addr;
+}
+
+/**
+ * text_poke - Update instructions on a live kernel
+ * @addr: address to modify
+ * @opcode: source of the copy
+ * @len: length to copy
+ *
+ * Only atomic text poke/set should be allowed when not doing early patching.
+ * It means the size must be writable atomically and the address must be aligned
+ * in a way that permits an atomic write. It also makes sure we fit on a single
+ * page.
+ */
+void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
+{
+	unsigned long flags;
+	char *vaddr;
+	int nr_pages = 2;
+
+	BUG_ON(len > sizeof(long));
+	BUG_ON((((long)addr + len - 1) & ~(sizeof(long) - 1))
+		- ((long)addr & ~(sizeof(long) - 1)));
+	{
+		struct page *pages[2] = { virt_to_page(addr),
+			virt_to_page(addr + PAGE_SIZE) };
+		if (!pages[1])
+			nr_pages = 1;
+		vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
+		WARN_ON(!vaddr);
+		local_irq_save(flags);
+		memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
+		local_irq_restore(flags);
+		vunmap(vaddr);
+	}
 	sync_core();
 	/* Could also do a CLFLUSH here to speed up CPU recovery; but
 	   that causes hangs on some VIA CPUs. */
+	return addr;
 }
Index: linux-x86.q/include/asm-x86/alternative.h
===================================================================
--- linux-x86.q.orig/include/asm-x86/alternative.h
+++ linux-x86.q/include/asm-x86/alternative.h
@@ -156,6 +156,27 @@ apply_paravirt(struct paravirt_patch_sit
 #define __parainstructions_end	NULL
 #endif
 
-extern void text_poke(void *addr, unsigned char *opcode, int len);
+extern void add_nops(void *insns, unsigned int len);
+
+/*
+ * Clear and restore the kernel write-protection flag on the local CPU.
+ * Allows the kernel to edit read-only pages.
+ * Side-effect: any interrupt handler running between save and restore will have
+ * the ability to write to read-only pages.
+ *
+ * Warning:
+ * Code patching in the UP case is safe if NMIs and MCE handlers are stopped and
+ * no thread can be preempted in the instructions being modified (no iret to an
+ * invalid instruction possible) or if the instructions are changed from a
+ * consistent state to another consistent state atomically.
+ * More care must be taken when modifying code in the SMP case because of
+ * Intel's errata.
+ * On the local CPU you need to be protected again NMI or MCE handlers seeing an
+ * inconsistent instruction while you patch.
+ * The _early version expects the memory to already be RW.
+ */
+
+extern void *text_poke(void *addr, const void *opcode, size_t len);
+extern void *text_poke_early(void *addr, const void *opcode, size_t len);
 
 #endif /* _ASM_X86_ALTERNATIVE_H */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86 - Enhance DEBUG_RODATA support - alternatives
  2008-03-06 14:01       ` Ingo Molnar
  2008-03-06 13:59         ` pageexec
@ 2008-03-06 14:18         ` Mathieu Desnoyers
  1 sibling, 0 replies; 11+ messages in thread
From: Mathieu Desnoyers @ 2008-03-06 14:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Srinivasa DS, Andrew Morton, linux-kernel, ananth, Jim Keniston,
	srikar, SystemTAP, Andi Kleen, pageexec, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Jeremy Fitzhardinge,
	Arjan van de Ven

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> [ Cc:-ed Arjan - he wrote and handles the DEBUG_RODATA bits. ]
> 
> * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> > Hrm, yeah, that's because http://lkml.org/lkml/2008/2/2/227
> > 
> > x86 - Enhance DEBUG_RODATA support - alternatives
> > 
> > has been pulled out of the x86 tree. It implements the correct 
> > text_poke required to support this. It's been removed because Xen does 
> > not support the WP bit in the guest kernels.
> > 
> > Can you try my new replacement implementation ? It creates another 
> > mapping instead of modifying the WP bit.
> 
> thanks, applied it to x86.git.
> 
> note, there were rejects and i had to hand-merge it - in general it's 
> best to send development x86 patches against:
> 
>     http://people.redhat.com/mingo/x86.git/README
> 
> find the merged patch is below - i hope i merged it right ;)
> 

Yep, it looks good. Ok, I'll try to update my patches against the x86
tree as much as possible in the future before submission.

Due to the number of trees I work in (mainline for LTTng, vm.git for
slub development, x86...), I may happen to forget to port my patches to
the right tree. I'll do my best.

Thanks,

Mathieu

> 	Ingo
> 
> ------------->
> Subject: x86: enhance DEBUG_RODATA support - alternatives
> From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Date: Thu, 6 Mar 2008 08:48:49 -0500
> 
> Fix a memcpy that should be a text_poke (in apply_alternatives).
> 
> Use kernel_wp_save/kernel_wp_restore in text_poke to support DEBUG_RODATA
> correctly and so the CPU HOTPLUG special case can be removed.
> 
> Add text_poke_early, for alternatives and paravirt boot-time and module load
> time patching.
> 
> Changelog:
> 
> - Fix text_set and text_poke alignment check (mixed up bitwise and and or)
> - Remove text_set
> - Export add_nops, so it can be used by others.
> - Document text_poke_early.
> - Remove clflush, since it breaks some VIA architectures and is not strictly
>   necessary.
> - Add kerneldoc to text_poke and text_poke_early.
> - Create a second vmap instead of using the WP bit to support Xen and VMI.
> - Move local_irq disable within text_poke and text_poke_early to be able to
>   be sleepable in these functions.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> CC: Andi Kleen <andi@firstfloor.org>
> CC: pageexec@freemail.hu
> CC: H. Peter Anvin <hpa@zytor.com>
> CC: Jeremy Fitzhardinge <jeremy@goop.org>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/x86/kernel/alternative.c |   88 +++++++++++++++++++++++++++++++-----------
>  include/asm-x86/alternative.h |   23 ++++++++++
>  2 files changed, 87 insertions(+), 24 deletions(-)
> 
> Index: linux-x86.q/arch/x86/kernel/alternative.c
> ===================================================================
> --- linux-x86.q.orig/arch/x86/kernel/alternative.c
> +++ linux-x86.q/arch/x86/kernel/alternative.c
> @@ -11,6 +11,8 @@
>  #include <asm/mce.h>
>  #include <asm/nmi.h>
>  #include <asm/vsyscall.h>
> +#include <asm/cacheflush.h>
> +#include <asm/io.h>
>  
>  #define MAX_PATCH_LEN (255-1)
>  
> @@ -173,7 +175,7 @@ static const unsigned char*const * find_
>  #endif /* CONFIG_X86_64 */
>  
>  /* Use this to add nops to a buffer, then text_poke the whole buffer. */
> -static void add_nops(void *insns, unsigned int len)
> +void add_nops(void *insns, unsigned int len)
>  {
>  	const unsigned char *const *noptable = find_nop_table();
>  
> @@ -186,6 +188,7 @@ static void add_nops(void *insns, unsign
>  		len -= noplen;
>  	}
>  }
> +EXPORT_SYMBOL_GPL(add_nops);
>  
>  extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
>  extern u8 *__smp_locks[], *__smp_locks_end[];
> @@ -219,7 +222,7 @@ void apply_alternatives(struct alt_instr
>  		memcpy(insnbuf, a->replacement, a->replacementlen);
>  		add_nops(insnbuf + a->replacementlen,
>  			 a->instrlen - a->replacementlen);
> -		text_poke(instr, insnbuf, a->instrlen);
> +		text_poke_early(instr, insnbuf, a->instrlen);
>  	}
>  }
>  
> @@ -280,7 +283,6 @@ void alternatives_smp_module_add(struct 
>  				 void *text,  void *text_end)
>  {
>  	struct smp_alt_module *smp;
> -	unsigned long flags;
>  
>  	if (noreplace_smp)
>  		return;
> @@ -306,39 +308,37 @@ void alternatives_smp_module_add(struct 
>  		__func__, smp->locks, smp->locks_end,
>  		smp->text, smp->text_end, smp->name);
>  
> -	spin_lock_irqsave(&smp_alt, flags);
> +	spin_lock(&smp_alt);
>  	list_add_tail(&smp->next, &smp_alt_modules);
>  	if (boot_cpu_has(X86_FEATURE_UP))
>  		alternatives_smp_unlock(smp->locks, smp->locks_end,
>  					smp->text, smp->text_end);
> -	spin_unlock_irqrestore(&smp_alt, flags);
> +	spin_unlock(&smp_alt);
>  }
>  
>  void alternatives_smp_module_del(struct module *mod)
>  {
>  	struct smp_alt_module *item;
> -	unsigned long flags;
>  
>  	if (smp_alt_once || noreplace_smp)
>  		return;
>  
> -	spin_lock_irqsave(&smp_alt, flags);
> +	spin_lock(&smp_alt);
>  	list_for_each_entry(item, &smp_alt_modules, next) {
>  		if (mod != item->mod)
>  			continue;
>  		list_del(&item->next);
> -		spin_unlock_irqrestore(&smp_alt, flags);
> +		spin_unlock(&smp_alt);
>  		DPRINTK("%s: %s\n", __func__, item->name);
>  		kfree(item);
>  		return;
>  	}
> -	spin_unlock_irqrestore(&smp_alt, flags);
> +	spin_unlock(&smp_alt);
>  }
>  
>  void alternatives_smp_switch(int smp)
>  {
>  	struct smp_alt_module *mod;
> -	unsigned long flags;
>  
>  #ifdef CONFIG_LOCKDEP
>  	/*
> @@ -355,7 +355,7 @@ void alternatives_smp_switch(int smp)
>  		return;
>  	BUG_ON(!smp && (num_online_cpus() > 1));
>  
> -	spin_lock_irqsave(&smp_alt, flags);
> +	spin_lock(&smp_alt);
>  
>  	/*
>  	 * Avoid unnecessary switches because it forces JIT based VMs to
> @@ -379,7 +379,7 @@ void alternatives_smp_switch(int smp)
>  						mod->text, mod->text_end);
>  	}
>  	smp_mode = smp;
> -	spin_unlock_irqrestore(&smp_alt, flags);
> +	spin_unlock(&smp_alt);
>  }
>  
>  #endif
> @@ -407,7 +407,7 @@ void apply_paravirt(struct paravirt_patc
>  
>  		/* Pad the rest with nops */
>  		add_nops(insnbuf + used, p->len - used);
> -		text_poke(p->instr, insnbuf, p->len);
> +		text_poke_early(p->instr, insnbuf, p->len);
>  	}
>  }
>  extern struct paravirt_patch_site __start_parainstructions[],
> @@ -416,8 +416,6 @@ extern struct paravirt_patch_site __star
>  
>  void __init alternative_instructions(void)
>  {
> -	unsigned long flags;
> -
>  	/* The patching is not fully atomic, so try to avoid local interruptions
>  	   that might execute the to be patched code.
>  	   Other CPUs are not running. */
> @@ -426,7 +424,6 @@ void __init alternative_instructions(voi
>  	stop_mce();
>  #endif
>  
> -	local_irq_save(flags);
>  	apply_alternatives(__alt_instructions, __alt_instructions_end);
>  
>  	/* switch to patch-once-at-boottime-only mode and free the
> @@ -458,7 +455,6 @@ void __init alternative_instructions(voi
>  	}
>  #endif
>   	apply_paravirt(__parainstructions, __parainstructions_end);
> -	local_irq_restore(flags);
>  
>  	if (smp_alt_once)
>  		free_init_pages("SMP alternatives",
> @@ -471,18 +467,64 @@ void __init alternative_instructions(voi
>  #endif
>  }
>  
> -/*
> - * Warning:
> +/**
> + * text_poke_early - Update instructions on a live kernel at boot time
> + * @addr: address to modify
> + * @opcode: source of the copy
> + * @len: length to copy
> + *
>   * When you use this code to patch more than one byte of an instruction
>   * you need to make sure that other CPUs cannot execute this code in parallel.
> - * Also no thread must be currently preempted in the middle of these instructions.
> - * And on the local CPU you need to be protected again NMI or MCE handlers
> - * seeing an inconsistent instruction while you patch.
> + * Also no thread must be currently preempted in the middle of these
> + * instructions. And on the local CPU you need to be protected again NMI or MCE
> + * handlers seeing an inconsistent instruction while you patch.
>   */
> -void __kprobes text_poke(void *addr, unsigned char *opcode, int len)
> +void *text_poke_early(void *addr, const void *opcode, size_t len)
>  {
> +	unsigned long flags;
> +	local_irq_save(flags);
>  	memcpy(addr, opcode, len);
> +	local_irq_restore(flags);
> +	sync_core();
> +	/* Could also do a CLFLUSH here to speed up CPU recovery; but
> +	   that causes hangs on some VIA CPUs. */
> +	return addr;
> +}
> +
> +/**
> + * text_poke - Update instructions on a live kernel
> + * @addr: address to modify
> + * @opcode: source of the copy
> + * @len: length to copy
> + *
> + * Only atomic text poke/set should be allowed when not doing early patching.
> + * It means the size must be writable atomically and the address must be aligned
> + * in a way that permits an atomic write. It also makes sure we fit on a single
> + * page.
> + */
> +void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
> +{
> +	unsigned long flags;
> +	char *vaddr;
> +	int nr_pages = 2;
> +
> +	BUG_ON(len > sizeof(long));
> +	BUG_ON((((long)addr + len - 1) & ~(sizeof(long) - 1))
> +		- ((long)addr & ~(sizeof(long) - 1)));
> +	{
> +		struct page *pages[2] = { virt_to_page(addr),
> +			virt_to_page(addr + PAGE_SIZE) };
> +		if (!pages[1])
> +			nr_pages = 1;
> +		vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> +		WARN_ON(!vaddr);
> +		local_irq_save(flags);
> +		memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> +		local_irq_restore(flags);
> +		vunmap(vaddr);
> +	}
>  	sync_core();
>  	/* Could also do a CLFLUSH here to speed up CPU recovery; but
>  	   that causes hangs on some VIA CPUs. */
> +	return addr;
>  }
> Index: linux-x86.q/include/asm-x86/alternative.h
> ===================================================================
> --- linux-x86.q.orig/include/asm-x86/alternative.h
> +++ linux-x86.q/include/asm-x86/alternative.h
> @@ -156,6 +156,27 @@ apply_paravirt(struct paravirt_patch_sit
>  #define __parainstructions_end	NULL
>  #endif
>  
> -extern void text_poke(void *addr, unsigned char *opcode, int len);
> +extern void add_nops(void *insns, unsigned int len);
> +
> +/*
> + * Clear and restore the kernel write-protection flag on the local CPU.
> + * Allows the kernel to edit read-only pages.
> + * Side-effect: any interrupt handler running between save and restore will have
> + * the ability to write to read-only pages.
> + *
> + * Warning:
> + * Code patching in the UP case is safe if NMIs and MCE handlers are stopped and
> + * no thread can be preempted in the instructions being modified (no iret to an
> + * invalid instruction possible) or if the instructions are changed from a
> + * consistent state to another consistent state atomically.
> + * More care must be taken when modifying code in the SMP case because of
> + * Intel's errata.
> + * On the local CPU you need to be protected again NMI or MCE handlers seeing an
> + * inconsistent instruction while you patch.
> + * The _early version expects the memory to already be RW.
> + */
> +
> +extern void *text_poke(void *addr, const void *opcode, size_t len);
> +extern void *text_poke_early(void *addr, const void *opcode, size_t len);
>  
>  #endif /* _ASM_X86_ALTERNATIVE_H */

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86 - Enhance DEBUG_RODATA support - alternatives
  2008-03-06 13:59         ` pageexec
@ 2008-03-06 15:52           ` Ingo Molnar
  2008-03-06 16:21           ` Andi Kleen
  2008-03-06 17:25           ` Mathieu Desnoyers
  2 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2008-03-06 15:52 UTC (permalink / raw)
  To: pageexec
  Cc: Mathieu Desnoyers, Srinivasa DS, Andrew Morton, linux-kernel,
	ananth, Jim Keniston, srikar, SystemTAP, Andi Kleen,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Jeremy Fitzhardinge, Arjan van de Ven


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> > +		vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> > +		WARN_ON(!vaddr);
> > +		local_irq_save(flags);
> > +		memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> > +		local_irq_restore(flags);
> 
> you probably want a BUG_ON instead (or some graceful recovery) else 
> the NULL deref will trigger with IRQs off...

agreed - i changed it to BUG_ON() for the time being.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86 - Enhance DEBUG_RODATA support - alternatives
  2008-03-06 13:59         ` pageexec
  2008-03-06 15:52           ` Ingo Molnar
@ 2008-03-06 16:21           ` Andi Kleen
  2008-03-06 17:25           ` Mathieu Desnoyers
  2 siblings, 0 replies; 11+ messages in thread
From: Andi Kleen @ 2008-03-06 16:21 UTC (permalink / raw)
  To: pageexec
  Cc: Mathieu Desnoyers, Ingo Molnar, Srinivasa DS, Andrew Morton,
	linux-kernel, ananth, Jim Keniston, srikar, SystemTAP,
	Andi Kleen, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Jeremy Fitzhardinge, Arjan van de Ven

> you probably want a BUG_ON instead (or some graceful recovery) else
> the NULL deref will trigger with IRQs off...

FWIW oopses don't have any trouble with interrupts being off.
So BUG_ON wouldn't add much value.

-Andi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86 - Enhance DEBUG_RODATA support - alternatives
  2008-03-06 13:59         ` pageexec
  2008-03-06 15:52           ` Ingo Molnar
  2008-03-06 16:21           ` Andi Kleen
@ 2008-03-06 17:25           ` Mathieu Desnoyers
  2008-03-11 12:17             ` Andi Kleen
  2 siblings, 1 reply; 11+ messages in thread
From: Mathieu Desnoyers @ 2008-03-06 17:25 UTC (permalink / raw)
  To: pageexec
  Cc: Ingo Molnar, Srinivasa DS, Andrew Morton, linux-kernel, ananth,
	Jim Keniston, srikar, SystemTAP, Andi Kleen, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Jeremy Fitzhardinge,
	Arjan van de Ven

* pageexec@freemail.hu (pageexec@freemail.hu) wrote:
> On 6 Mar 2008 at 15:01, Ingo Molnar wrote:
> 
> > +/**
> > + * text_poke - Update instructions on a live kernel
> > + * @addr: address to modify
> > + * @opcode: source of the copy
> > + * @len: length to copy
> > + *
> > + * Only atomic text poke/set should be allowed when not doing early patching.
> > + * It means the size must be writable atomically and the address must be aligned
> > + * in a way that permits an atomic write. It also makes sure we fit on a single
> > + * page.
> > + */
> > +void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
> > +{
> > +	unsigned long flags;
> > +	char *vaddr;
> > +	int nr_pages = 2;
> > +
> > +	BUG_ON(len > sizeof(long));
> > +	BUG_ON((((long)addr + len - 1) & ~(sizeof(long) - 1))
> > +		- ((long)addr & ~(sizeof(long) - 1)));
> > +	{
> > +		struct page *pages[2] = { virt_to_page(addr),
> > +			virt_to_page(addr + PAGE_SIZE) };

I just want to ask for confirmation about two things :

First, that calling this text_poke implementation to modify text in a
module won't fail. Is virt_to_page(addr) ok if addr is in a vmalloc'ed
area ?

Second, that calling virt_to_page(addr + PAGE_SIZE) won't have
undesirable side-effects if addr happens to be in the last page of an
allocated range. It should be ok for the core kernel text, because it is
followed by the kernel rodata, but I am not certain for modules.

If any of these two are not true, then we should add a test for

if (kernel_text_address(addr)) {

around the vmap() allocation and fallback on 

} else
  vaddr = addr;

in the cases where text_poke is called to modify module code.

Mathieu

> > +		if (!pages[1])
> > +			nr_pages = 1;
> > +		vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> > +		WARN_ON(!vaddr);
> > +		local_irq_save(flags);
> > +		memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> > +		local_irq_restore(flags);
> 
> you probably want a BUG_ON instead (or some graceful recovery) else
> the NULL deref will trigger with IRQs off...
> 
> > +		vunmap(vaddr);
> > +	}
> >  	sync_core();
> >  	/* Could also do a CLFLUSH here to speed up CPU recovery; but
> >  	   that causes hangs on some VIA CPUs. */
> > +	return addr;
> >  }
> 
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86 - Enhance DEBUG_RODATA support - alternatives
  2008-03-06 17:25           ` Mathieu Desnoyers
@ 2008-03-11 12:17             ` Andi Kleen
  0 siblings, 0 replies; 11+ messages in thread
From: Andi Kleen @ 2008-03-11 12:17 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: pageexec, Ingo Molnar, Srinivasa DS, Andrew Morton, linux-kernel,
	ananth, Jim Keniston, srikar, SystemTAP, Andi Kleen,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Jeremy Fitzhardinge, Arjan van de Ven

> First, that calling this text_poke implementation to modify text in a
> module won't fail. Is virt_to_page(addr) ok if addr is in a vmalloc'ed
> area ?

virt_to_page only works on direct mapping addresses, not vmalloc.
> 
> Second, that calling virt_to_page(addr + PAGE_SIZE) won't have
> undesirable side-effects if addr happens to be in the last page of an
> allocated range. It should be ok for the core kernel text, because it is
> followed by the kernel rodata, but I am not certain for modules.

On non vmemmap/none flatmem kernels it could fail yes. But vmalloc/module_alloc
is not guaranteed to be continuous so you cannot do that anyways.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-03-11 12:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-06 12:41 [BUG] Kprobes fails on 2.6.25-rc3-mm1 (x86) systems, if CONFIG_DEBUG_RODATA is set Srinivasa DS
2008-03-06 12:55 ` Mathieu Desnoyers
2008-03-06 13:33   ` Srinivasa DS
2008-03-06 13:48     ` [PATCH] x86 - Enhance DEBUG_RODATA support - alternatives Mathieu Desnoyers
2008-03-06 14:01       ` Ingo Molnar
2008-03-06 13:59         ` pageexec
2008-03-06 15:52           ` Ingo Molnar
2008-03-06 16:21           ` Andi Kleen
2008-03-06 17:25           ` Mathieu Desnoyers
2008-03-11 12:17             ` Andi Kleen
2008-03-06 14:18         ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).