Should GCC optimize this?

Feel like posting Off Topic? Do it here.

Moderator: MaxCoderz Staff

Post Reply
User avatar
Halifax
Sir Posts-A-Lot
Posts: 225
Joined: Mon 01 Jan, 2007 10:39 am
Location: Pennsylvania, US

Should GCC optimize this?

Post by Halifax »

GCC 3.4.2 was used in this example.

Code: Select all

int conv_binary(const char *str, const char *end)
{
   int result = 0;
   
   while (str < end {
      result *= 2;
      if (*str == '1') 
         result++;
      str++;
   }
   return result;
}
to this

Code: Select all

int conv_binary(const char *str, const char *end)
{
   int result;

   while (str < end) {
      if (*str++ == '1')
         result++;
   }
}
You may be surprised or you may not, but either way in the latter GCC does not inline the increment into the code. Instead where the increment was it puts a jump which goes to the exact same instruction that was inlined in the previous one and then from that label it jumps to the beginning of the loop.

Assembly code generated on an i486

Code: Select all

_conv_binary:
   pushl %ebp
   movl %esp, %ebp
   subl $4, %esp
   movl $0, -4(%ebp)
L3:
   movl 8(%ebp), %eax
   cmpl 12(%ebp), %eax
   jae L4
   movl -4(%ebp), %eax
   addl %eax, %eax
   movl %eax, -4(%ebp)
   movl 8(%ebp), %eax
   cmpb $49, (%eax)
   jne L5
   leal -4(%ebp), %eax
   incl (%eax)
L5:
   incl 8(%ebp)
   jmp L3
L4:
   movl -4(%ebp), %eax
   leave
   ret

to this

Code: Select all

	pushl	%ebp
	movl	%esp, %ebp
	subl	$4, %esp
	movl	$0, -4(%ebp)
L3:
	movl	8(%ebp), %eax
	cmpl	12(%ebp), %eax
	jae	L4
	movl	-4(%ebp), %eax
	addl	%eax, %eax
	movl	%eax, -4(%ebp)
	movl	8(%ebp), %eax
	incl	8(%ebp)
	cmpb	$49, (%eax)
	jne	L3
	leal	-4(%ebp), %eax
	incl	(%eax)
	jmp	L3
L4:
	movl	-4(%ebp), %eax
	leave
	ret
This will decrease the amount of time wasted of the jump present in each loop of the former routine. This is good for when converting a very large number. Oh and if anyone cares. TIGCC generated the same code for each instance although it was unoptimized a great amount.
User avatar
driesguldolf
Extreme Poster
Posts: 395
Joined: Thu 17 May, 2007 4:49 pm
Location: $4080
Contact:

Post by driesguldolf »

You know m68000 (it looks alot like code for the 68k calcs)? cool! what guide did you use to learn it?

Sorry, but I know nothing that can help you...
User avatar
Halifax
Sir Posts-A-Lot
Posts: 225
Joined: Mon 01 Jan, 2007 10:39 am
Location: Pennsylvania, US

Post by Halifax »

Nah that code above is i486 assembly or more well known as x86 assembly.

Yes I do know 68K assembly, and I learned it from multiple sources that can be found on ticalc.org and technoplaza.net, but most experimentation with the code that GCC generates.

Here is the 68K code generated by TIGCC 0.96 beta 8 if anyone cares :)

Code: Select all

conv_binary:
	move.l 4(%sp),%a0
	move.l 8(%sp),%d1
	clr.w %d0
	jbra .L4
.L5:
	add.w %d0,%d0
	cmp.b #49,(%a0)
	jbne .L6
	addq.w #1,%d0
.L6:
	addq.l #1,%a0
.L4:
	cmp.l %a0,%d1
	jbhi .L5
	rts
I believe 68K assembly is much more understandable than x86 although you can really understand either once you know one of them.

One and driesguldolf here are some links:

http://www.ticalc.org/pub/text/68k/
http://www.technoplaza.net/
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

So where did result *=2 (addl %eax,%eax) go?
Am I too confused or did the optimizer actually change the code..? It looks to me like the second code wouldn't do what it is supposed to do..

for driesguldolf (z80)
slightly changed (length input) to be shorter:

Code: Select all

 ;Inputs: DE -> string, B = length
 ;Outputs: HL = number
ConvBin:
 ld hl,0
conv_loop:
 ld a,(de)
 inc de
 cp '1'
 ccf  ;here is where the magic happens
 adc hl,hl
 djnz conv_loop
 ret
(I hope I didn't make any big mistakes there lol)
User avatar
Halifax
Sir Posts-A-Lot
Posts: 225
Joined: Mon 01 Jan, 2007 10:39 am
Location: Pennsylvania, US

Post by Halifax »

King_Harold: I think your eyes are going bad because there are addl's in each one ;) Look closer around the jae L4

And yes it appears that your routine above works. Although I would like to see your translation for the original routine.

Also the problem with your code above is that it only goes up to 255 in length. The original routine can have any length which is easier to keep track of just by comparing the pointers.
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

Halifax wrote:Also the problem with your code above is that it only goes up to 255 in length. The original routine can have any length which is easier to keep track of just by comparing the pointers.
Which has no significance whatsoever, since the number of bits an integer can hold is a much stricter limit.
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

Halifax wrote:King_Harold: I think your eyes are going bad because there are addl's in each one ;) Look closer around the jae L4

And yes it appears that your routine above works. Although I would like to see your translation for the original routine.

Also the problem with your code above is that it only goes up to 255 in length. The original routine can have any length which is easier to keep track of just by comparing the pointers.
Ah yes I see.. didn't look closely enough I guess..

And like CoBB said, the length only has to go up to 16, anything more would just overflow unless you have leading zero's (or, in fact, leading anything-lower-than-1's)
But 240 leading zero's is just silly (oh btw, it goes up to 256, if you load B with zero)
User avatar
driesguldolf
Extreme Poster
Posts: 395
Joined: Thu 17 May, 2007 4:49 pm
Location: $4080
Contact:

Post by driesguldolf »

Thanks for the links Halifax (for some reason I didn't tought of ticalc.org :oops:)

@KH: that routine should work, nice trick!
User avatar
Halifax
Sir Posts-A-Lot
Posts: 225
Joined: Mon 01 Jan, 2007 10:39 am
Location: Pennsylvania, US

Post by Halifax »

Yeah that was a nice trick King_Harold. I am surprised.

driesguldol: No problem.
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

I think I can make it even nicer though:

Code: Select all

 ;In: DE -> string, B = length
 ;Out: HL = number
 ld hl,$FFFF
conv_loop:
 ld a,(de)
 cp '1'
 adc hl,hl
 djnz conv_loop
 ld a,h
 cpl
 ld h,a
 ld a,l
 cpl
 ld l,a
 ret
this will be 4 tstates faster for every iteration past the 6th one .. I think..

Don't be so surprised Halifax, just because I act like an idiot doesn't mean I write code like one, well not all the time anyway ;)
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

One can get even uglier:

Code: Select all

ld a,'0'
ld hl,0
cp (ix+0)
adc hl,hl
cp (ix+1)
adc hl,hl
cp (ix+2)
adc hl,hl
[...]
cp (ix+15)
adc hl,hl
Lengths other than 16 are possible by adjusting ix and adding a self-modified jump before the spaghetti part starts, but I'm lazy to figure out if it pays off time-wise. :P
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

Why do you even need ld hl,0 then? all those bits will be lost for sure if you have a fixed-length routine like that
or am I missing something (again lol)?
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

King Harold wrote:Why do you even need ld hl,0 then? all those bits will be lost for sure if you have a fixed-length routine like that
or am I missing something (again lol)?
You need it only for the variable length version, of course.
Post Reply