LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Gabriel Paubert <paubert@iram.es>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	linuxppc-dev@ozlabs.org, paulus@samba.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] add strncmp to PowerPC
Date: Mon, 3 Mar 2008 10:54:43 +0100	[thread overview]
Message-ID: <20080303095443.GB27105@iram.es> (raw)
In-Reply-To: <Pine.LNX.4.58.0802292250490.21555@gandalf.stny.rr.com>

On Fri, Feb 29, 2008 at 10:56:45PM -0500, Steven Rostedt wrote:
> 
> On Sat, 1 Mar 2008, Benjamin Herrenschmidt wrote:
> >
> > Do we have any indication that it performs better than the C one ?
> 
> See below.
> 
> >
> > Ben.
> >
> 
> > >
> > > +_GLOBAL(strncmp)
> > > +	mtctr	r5
> > > +	addi	r5,r3,-1
> > > +	addi	r4,r4,-1
> > > +1:	lbzu	r3,1(r5)
> > > +	cmpwi	1,r3,0
> > > +	lbzu	r0,1(r4)
> > > +	subf.	r3,r0,r3
> > > +	beqlr	1
> > > +	bdnzt	eq,1b
> > > +	blr
> > > +
> 
> 
> And here's the objdump of the C version:
> 
> 0000000000000080 <.strncmp>:
>   80:   fb e1 ff f0     std     r31,-16(r1)
>   84:   f8 21 ff c1     stdu    r1,-64(r1)
>   88:   7c 69 1b 78     mr      r9,r3
>   8c:   7c a0 2b 79     mr.     r0,r5
>   90:   38 60 00 00     li      r3,0
>   94:   7c 09 03 a6     mtctr   r0
>   98:   7c 3f 0b 78     mr      r31,r1
>   9c:   41 82 00 68     beq-    104 <.strncmp+0x84>
>   a0:   89 69 00 00     lbz     r11,0(r9)
>   a4:   88 04 00 00     lbz     r0,0(r4)
>   a8:   7c 00 58 50     subf    r0,r0,r11
>   ac:   78 00 06 20     clrldi  r0,r0,56
>   b0:   2f a0 00 00     cmpdi   cr7,r0,0
>   b4:   7c 00 07 74     extsb   r0,r0
>   b8:   7c 03 03 78     mr      r3,r0
>   bc:   40 9e 00 48     bne-    cr7,104 <.strncmp+0x84>
>   c0:   2f ab 00 00     cmpdi   cr7,r11,0
>   c4:   41 9e 00 40     beq-    cr7,104 <.strncmp+0x84>
>   c8:   38 84 00 01     addi    r4,r4,1
>   cc:   38 69 00 01     addi    r3,r9,1
>   d0:   42 40 00 30     bdz-    100 <.strncmp+0x80>
>   d4:   88 03 00 00     lbz     r0,0(r3)
>   d8:   89 24 00 00     lbz     r9,0(r4)
>   dc:   38 63 00 01     addi    r3,r3,1
>   e0:   38 84 00 01     addi    r4,r4,1
>   e4:   2f 20 00 00     cmpdi   cr6,r0,0
>   e8:   7c 09 00 50     subf    r0,r9,r0
>   ec:   78 00 06 20     clrldi  r0,r0,56
>   f0:   2f a0 00 00     cmpdi   cr7,r0,0
>   f4:   7c 00 07 74     extsb   r0,r0
>   f8:   40 9e 00 08     bne-    cr7,100 <.strncmp+0x80>
>   fc:   40 9a ff d4     bne+    cr6,d0 <.strncmp+0x50>
>  100:   7c 03 03 78     mr      r3,r0
>  104:   e8 21 00 00     ld      r1,0(r1)
>  108:   eb e1 ff f0     ld      r31,-16(r1)
>  10c:   4e 80 00 20     blr
> 
> 
> I'll let you decide ;-)
> 
> Even if it was logically faster (which I still doubt) it's a hell of a lot
> of cache lines to waste.

Indeed, but there are some corner cases that the C code handles. Like
a length of 0 which may lead to infinite loop in the asm code. 

OTOH, I'm a bit surprised by the extsb instructions in the compiler generated
code. We don't compile with -fsigned-char, do we? The clrldi
instructions are also extremely stupid.

Now that I think a bit more about it, I believe that the C version is
incorrect: the clrldi/extsb dance takes a value between -255 and +255
and collapses it into the -128 to 127 range, meaning that the return
value may be wrong if we rely on the sign of the result. So unless I
miss something, the problem is much more serious than just stupid code
(I had just a look at the libc version in C and characters are cast to
unsigned char before the comparison).

	Regards,
	Gabriel

  reply	other threads:[~2008-03-03  9:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-29 16:04 Steven Rostedt
2008-03-01  3:04 ` Benjamin Herrenschmidt
2008-03-01  3:56   ` Steven Rostedt
2008-03-03  9:54     ` Gabriel Paubert [this message]
2008-03-03 10:10       ` Andreas Schwab
2008-03-03 19:08       ` Segher Boessenkool
2008-03-05  4:03   ` Paul Mackerras
2008-03-05  5:26     ` Segher Boessenkool
2008-03-05  5:39       ` Paul Mackerras
2008-03-05  7:01         ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080303095443.GB27105@iram.es \
    --to=paubert@iram.es \
    --cc=benh@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=paulus@samba.org \
    --cc=rostedt@goodmis.org \
    --subject='Re: [PATCH] add strncmp to PowerPC' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).