[C] swapping vars

Feel like posting Off Topic? Do it here.

Moderator: MaxCoderz Staff

King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

[C] swapping vars

Post by King Harold »

I found a piece of code on wikipedia:

Code: Select all

/* function used for swapping values of two variables */
void swap(int *a, int *b)
{
  int c = *a;

  *a = *b;
  *b = c;
}
I almost fell from my chair!
using an extra variable??

this is more like it:

Code: Select all

/* function used for PROPERLY swapping values of two variables */
void swap(int *a, int *b)
{
  *a ^= *b;
  *b ^= *a;
  *a ^= *b;
}
It's not like it really matters, but using a local variable where you don't need one is evil. Ofcourse you could just use the evil code if you use a good optimizer..

In good ol' z80 assembly you don't have arbitrary xors, but you can do something else: (swap (de) and (hl) )

Code: Select all

ld a,(de)
ldi
dec hl
ld (hl),a
User avatar
Jim e
Calc King
Posts: 2457
Joined: Sun 26 Dec, 2004 5:27 am
Location: SXIOPO = Infinite lives for both players
Contact:

Post by Jim e »

Your code would fail if both pointers pointed to the same spot in memory.

More over I'd say the first would be faster.

yours has to

Read an int from memory
XOR an int from memory
Write an int from memory
Read an int from memory
XOR an int from memory
Write an int from memory
Read an int from memory
XOR an int from memory
Write an int from memory


Wiki's has to.

Read an int from memory
Read an int from memory
Write an int from memory
Write an int from memory

But then again I have very little idea whats going on in a compilers head.
Image
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

creating a local var creates something on the stack (or so I was told), and then compiler go all weird about saving the stack frame aswell (which should allways be the current value minus everything you just pushed on it), so that's actually a whole lot of instructions.



this is better:

Code: Select all

/* function used for PROPERLY swapping values of two variables */
void swap(int *a, int *b)
{
  if (a != b)
 {
   *a ^= *b;
   *b ^= *a;
   *a ^= *b;
 }
}
you could inline xchg eax,ebx ofcourse...
(which is accepted but inlining kills the optimizer in most cases)

furthermore, a load from memory doesn't take as ridiculously long as on the z80, it's less bad than allocating 8 bytes on the stack and pushing an int + later popping it. (which makes 0% sense but is done anyway)
User avatar
Jim e
Calc King
Posts: 2457
Joined: Sun 26 Dec, 2004 5:27 am
Location: SXIOPO = Infinite lives for both players
Contact:

Post by Jim e »

So you added a condition, how does that make it the proper way? The wiki showed the simplest and easiest to under stand code, and yet this method which will obviously have varying performance machine to machine is some how better?

I think there are better places to optimize than a 3 line swap function.
Image
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

So you added a condition, how does that make it the proper way?
this is how most people do a swap no?
The wiki showed the simplest and easiest to under stand code, and yet this method which will obviously have varying performance machine to machine is some how better?
well it doesn't use the stack twice, it looks more fun, if you have an optimizer it will probably also become xchg, but it looks better than declaring a variable you're only using once and you don't really need.
I think there are better places to optimize than a 3 line swap function.
ofcourse, but every little bit counts doesn't it? just because we have multi gigahertz computer these days doesn't mean we can get away with sloppy code (well, maybe we can, but it's still bad)
User avatar
GuillaumeH
Regular Member
Posts: 143
Joined: Fri 17 Dec, 2004 8:30 pm
Contact:

Post by GuillaumeH »

It's more productive to do high-level algorithm optimisation, and let the compiler do the dirty job.

This said, as soon as you program in C(++), you're already bothering too much :)
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

Then what should we do? Program in Python?
User avatar
Jim e
Calc King
Posts: 2457
Joined: Sun 26 Dec, 2004 5:27 am
Location: SXIOPO = Infinite lives for both players
Contact:

Post by Jim e »

King Harold wrote:
The wiki showed the simplest and easiest to under stand code, and yet this method which will obviously have varying performance machine to machine is some how better?
well it doesn't use the stack twice, it looks more fun, if you have an optimizer it will probably also become xchg, but it looks better than declaring a variable you're only using once and you don't really need.
Looks more fun? Looks Better? I suppose subjective arguments are allowable.

Why do you keep bringing up this xchg instruction? This is C not x86. I could bring up the sharp80's swap instruction for being the fastest way to swap nibbles but that means squat for everything else.

I don't see how you can ignore the fact that this xoring method requires 6 mem reads and 3 mem writes where the standard way requires only 2 reads and 2 writes. Whether stack is used or not is beside the point thats up to the compiler and the targeted hardware. You could say on this hardware its faster or that hardware its more practical, but I'd bet 8 out of 10 times wiki's way would be perfered.

I'll give you that xoring saves 1 extra register or 1 extra unit in mem, but thats not much of a save. You still have the cost of the extra lines of code and, likely, extra time wasted.
Image
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

King Harold wrote:you could inline xchg eax,ebx ofcourse...
Which would be an utterly stupid thing to do, since if you read two variables from the memory into registers, you could as well write them back in the opposite order instead of exchanging the values of the registers. XCHG needs at least one of the operands to be a register, so you can’t do it in a single instruction, at least not using the core instruction set of Intel CPUs.

And why would the straightforward method use the stack? The compiler will be clever enough to use a register for the local variable.

And your ‘clever’ code for swapping (de) and (hl) takes 36 clock cycles, while the straightforward

Code: Select all

ld a,(de)
ld b,(hl)
ld (hl),a
ld a,b
ld (de),a
takes only 32 at the same size, and it even modifies less registers (yours screws up c,d,e and f on top of the ones involved in this one).
coelurus
Calc Wizard
Posts: 585
Joined: Sun 19 Dec, 2004 9:02 pm
Location: Sweden
Contact:

Post by coelurus »

Get assembler output from the source code and benchmark the programs and we'll see what's the fastest. Guessing what one compiler might do is not usable in any way, not even for philosophical matters :)

Oh, and as mentioned, don't bother worrying about swaps. PCs are pretty clever and fast these days.
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

and it even modifies less registers (yours screws up c,d,e and f on top of the ones involved in this one).
you can inc bc, dec de, dec hl and only lose A. you can never save B that way you do it.
Oh, and as mentioned, don't bother worrying about swaps. PCs are pretty clever and fast these days.
which does not mean we can go around writing bad code..
Why do you keep bringing up this xchg instruction? This is C not x86.
How much C is not compiled for x86? surely a bit, but it won't be much, seeing as most computers have a x86 compitable cpu.



anyway. according to some teacher, the standard way to make a local variable is to allocate space on the stack. That is going to take more time then then doing 3 xors, 3 mem reads, and 3 mem writes, since it'd be calculating offsets etc and remembering a stackframe to push as well and only after that doing a few read/writes. how many compilers are going to use a non-standard way? not many of the often used ones I'd guess.
Which would be an utterly stupid thing to do, since if you read two variables from the memory into registers, you could as well write them back in the opposite order instead of exchanging the values of the registers.
you can not, you can only pass parameters in eax and ebx, you don't know where to write them to, but you can pass them back out exchanged
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

King Harold wrote:you can inc bc, dec de, dec hl and only lose A. you can never save B that way you do it.
That’s already considerably longer and slower, and it would still affect some flags thanks to LDI, which you’d need a push-pop for. But then you could as well do that for BC instead of going through all that...
King Harold wrote:which does not mean we can go around writing bad code..
Depends on your criteria as to what bad code means. Readability and maintainability are many orders of magnitude more important than performance. When performance is an issue, you have to take your favourite profiler, find the bottleneck, crazily optimise the critical portions and provide enough documentation/comments to keep it understandable for those who come back to the code later.
King Harold wrote:How much C is not compiled for x86? surely a bit, but it won't be much, seeing as most computers have a x86 compitable cpu.
Most CPUs are in embedded systems, and many of them are 16 or even 32-bit, where C is used quite often, so we’re talking about quite a lot of code.
King Harold wrote:anyway. according to some teacher, the standard way to make a local variable is to allocate space on the stack.
As long as there’s no optimisation involved, yes.
King Harold wrote:That is going to take more time then then doing 3 xors, 3 mem reads, and 3 mem writes, since it'd be calculating offsets etc and remembering a stackframe to push as well and only after that doing a few read/writes.
Well, in many cases stack space is allocated statically, during compile time, so it has zero runtime overhead.
King Harold wrote:how many compilers are going to use a non-standard way? not many of the often used ones I'd guess.
Any sensible C compiler you’ll come across will do extensive optimisation, be it VC, Borland or gcc, and they’ll shamelessly inline code and put local variables into registers. Old ones couldn’t do that efficiently, hence the ‘register’ keyword in the language!
King Harold wrote:you can not, you can only pass parameters in eax and ebx, you don't know where to write them to, but you can pass them back out exchanged
Well, that’s most likely compiler dependent (frankly, I never bothered using inline assembly since I moved on from Pascal, i.e. the good old BP7), and who’d go through a call-ret for single exchange anyway?
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

so, what are you saying now? what's your point?
Well, that’s most likely compiler dependent
that'd be VC.

optimization does not equal de-standardization, it's close though, but that's not the point is it? as soon as you're optimizing, both ways will turn out optimized - they might even become the same!

as far as readability goes - if you can't read 3 xors then you shouldn't be programming C at all (or you're in the process of learning maybe)
and who’d go through a call-ret for single exchange anyway
the guys at wikipedia obviously
User avatar
GuillaumeH
Regular Member
Posts: 143
Joined: Fri 17 Dec, 2004 8:30 pm
Contact:

Post by GuillaumeH »

King Harold wrote:Then what should we do? Program in Python?
If you want. There are also compiled languages.
User avatar
Jim e
Calc King
Posts: 2457
Joined: Sun 26 Dec, 2004 5:27 am
Location: SXIOPO = Infinite lives for both players
Contact:

Post by Jim e »

Code: Select all

;Just printf
	mov	DWORD PTR [esp+4],ecx
	mov	DWORD PTR [esp+8],eax
	mov	DWORD PTR [esp],0x404000
	call	0x402360 <printf>

;Standard Swap
	mov	DWORD PTR [esp],0x404000
	mov	ecx,DWORD PTR [ebp-16]
	mov	edx,DWORD PTR [ebp-20]
	mov	DWORD PTR [esp+8],ecx	
	mov	DWORD PTR [ebp-20],ecx
	mov	DWORD PTR [ebp-16],edx
	mov	DWORD PTR [esp+4],edx
	call	0x402360 <printf>

;xoring swap
	mov	DWORD PTR [esp],0x404000
	mov	eax,DWORD PTR [ebp-20]
	mov	ebx,DWORD PTR [ebp-16]
	xor	ebx,eax
	xor	eax,ebx
	xor	ebx,eax
	mov	DWORD PTR [ebp-16],ebx
	mov	DWORD PTR [esp+4],ebx
	mov	DWORD PTR [ebp-20],eax
	mov	DWORD PTR [esp+8],eax	
	call	0x402360 <printf>
Both swaps were functions, but the compiler decided to inline them. So if a compiler ever decides to inline this in real code...which probably likely, any and all advantage would disappear.


Oh and about that z80 code, I'd say they both have their advantages in different situations. King Harold's would be better suited if you want 2 swap to blocks of memory greater than 256 bytes with out using a third buffer. Because bc would affect the parity flag, that could be considered useful. However anything less than 257 bytes CoBBs with djnz would be better.
Image
Locked