[TI ASM] Useful routines.
Moderator: MaxCoderz Staff
-
- Calc King
- Posts: 1513
- Joined: Sat 05 Aug, 2006 7:22 am
strcpy is supposed to copy the terminating null too. And strncpy doesn't just copy the given number of bytes from src to dest, that's memcpy. If strncpy encounters the end of the source string, it fills the rest of the destination with zeros.
These should act more like their C counterparts:
These should act more like their C counterparts:
Code: Select all
_strcpy:
xor a
cp (hl)
ldi
jr nz,_strcpy+1
ret
_strncpy:
ld c,b ; ensure that LDI doesn't change B
xor a
cp (hl)
jr z,pad
ldi
djnz _strncpy+2
ret
pad: ld (de),a
inc de
djnz pad
ret
-
- Calc King
- Posts: 1513
- Joined: Sat 05 Aug, 2006 7:22 am
remember it's not the Z flag but the P/V flag that LDI affects
I'm never really sure about pe/po though
edit: so it should be pe right?
Code: Select all
_strcpy:
xor a
cp (hl)
ldi
jp pe,_strcpy+1
ret
edit: so it should be pe right?
Last edited by King Harold on Thu 16 Aug, 2007 6:55 pm, edited 1 time in total.
- calc84maniac
- Regular Member
- Posts: 112
- Joined: Wed 18 Oct, 2006 7:34 pm
- Location: The ex-planet Pluto
- Contact:
- driesguldolf
- Extreme Poster
- Posts: 395
- Joined: Thu 17 May, 2007 4:49 pm
- Location: $4080
- Contact:
Just something little I came up with, might be usefull in games:
You saw that correctly, it uses the Vera protocol for defining routines
Code: Select all
.module misc
;; === misc.fadestep ===
;;
;; Changes the contrast 1 closer to the specified value
;;
;; Pre:
;; c = Contrast to fade to
;;
;; Post:
;; c-flag = reset if (contrast)=c, set otherwise
;; af, b destroyed
;;
;; SeeAlso:
;; misc.fade
;;
;; Warning:
;; Providing an invalid contrast causes random values to be send to the lcd.
;; This error is caught if DEBUG is defined
;;
fadestep:
ld a, (contrast)
ld b, a
cp c
ret z ; c-flag is reset
sbc a, a ; Determ the direction of fading
or 1
add a, b
ld (contrast), a
add a, %11011000
ASSERT(nc) ; If carry is set then the contrast is invalid
out ($10), a
scf
ret ; c-flag is set
;; === misc.fade ===
;;
;; Fades the contrast to the specified value
;;
;; Post:
;; Contrast is faded to specified value
;; Interrupts are enabled
;; af, bc destroyed
;;
;; SeeAlso:
;; misc.fadestep
;;
;; Warning:
;; Speed depends on the isr timer, change the number of halts if needed
;;
fadelight:
ld c, 0
jr fade
fadedark:
ld c, 39
fade:
ei
halt
halt
call fadestep
jr c, fade
ret
.endmodule
Multiplying by a fraction.
This is a routine which many people may find useful. It's something I came up with for my 3D engine to fix the problem of polygons not being clipped all the way to the edges of the screen.
The problem was due to multiplying by a fixed-point reciprocal value to perform the line clipping: try representing 1/3 in fixed point - or floating point, for that matter. Problems like that, along with the limited (16 bit) precision resulted in the clipping not always being quite right. The best I could do was make sure the errors were biased to clip the lines to short, rather than too long - which would be bad.
I now have a much better (and in some cases, faster) solution to the problem and it could be used for many other applications.
This routine will multiply a 16-bit number by 16 bit/16 bit proper fraction. No reciprocals are involved, and it shortcuts the normal process of doing 16 bit * 16 bit = 32 bit multiplication followed by 32-bit / 16-bit divsion (eek!).
The routine can only handle fractional values up to 32767, although this can be overcome if you are willing to sacrifice some cycles. I don't actually need that ability, but I will make the modifications anyway and post this weekend when I have a bit more time.
The routine will perform HL * DE / BC. HL is unsigned - you will need to negate it before and after if you want a signed multiplication.
I hope someone find this useful. I am 99.998% confident it is bug free, but I don't have the Z80 time to test every possible combination of inputs!
I also haven't had a chance to calculate the timings, but will do so if time allows.
The problem was due to multiplying by a fixed-point reciprocal value to perform the line clipping: try representing 1/3 in fixed point - or floating point, for that matter. Problems like that, along with the limited (16 bit) precision resulted in the clipping not always being quite right. The best I could do was make sure the errors were biased to clip the lines to short, rather than too long - which would be bad.
I now have a much better (and in some cases, faster) solution to the problem and it could be used for many other applications.
This routine will multiply a 16-bit number by 16 bit/16 bit proper fraction. No reciprocals are involved, and it shortcuts the normal process of doing 16 bit * 16 bit = 32 bit multiplication followed by 32-bit / 16-bit divsion (eek!).
The routine can only handle fractional values up to 32767, although this can be overcome if you are willing to sacrifice some cycles. I don't actually need that ability, but I will make the modifications anyway and post this weekend when I have a bit more time.
The routine will perform HL * DE / BC. HL is unsigned - you will need to negate it before and after if you want a signed multiplication.
I hope someone find this useful. I am 99.998% confident it is bug free, but I don't have the Z80 time to test every possible combination of inputs!
I also haven't had a chance to calculate the timings, but will do so if time allows.
Code: Select all
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; alUMulByFraction:
;;
;; HL = HL * DE / BC
;;
;; This routine multiplies HL by proper fraction DE/BC.
;;
;; HL is treated as unsigned.
;; BC must be non-zero
;; BC must be < 32768
;; DE must be <= BC
;;
;; INPUTS:
;;
;; HL - multiplicand
;; DE - numerator
;; BC - denominator
;;
;; OUTPUTS:
;; HL - quotient of HL * DE / BC
;; DE - remainder of HL * DE / BC
;;
;; DESTROYED:
;; AF
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
alUMulByFraction:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; This initial block of code is used to return the output in more
;; convenient registers. It may be removed, in which case the
;; output will be returned as followed:
;;
;; IX - quotient
;; HL - remainder
;; DE - numerator (unchanged)
;; BC - denominator (unchanged)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
push ix ; [15] preserve IX
call _mainRtn ; [17] run main code.
ex (sp), ix ; [23] quotient <-> preserved IX
ex de, hl ; [4] remainder <-> numerator
pop hl ; [10] pop quotient
ret ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; The actual routine starts here.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
_mainRtn: ld ix, $0000 ; [14] initial quotient
ld a, h ; [4]
or a ; [4]
jr z, _highZero ; [12/7] jump if high byte is zero.
ld h, l ; [4] push contents of L to stack so
push hl ; [11] we can pop it into A later on.
ld hl, $0000 ; [10] initial remainder
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Skip to first significant bit to avoid useless work. We also
;; set the low bit of A to act as a marker bit for the main loop.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
scf ; [4] for marker bit
_findHiBit: adc a, a ; [4] searching for first significant
jp nc, _findHiBit ; [10] bit to avoid useless work.
call _mainLoop ; [*] process high byte.
pop af ; [10] pop low byte
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Now process the low byte.
;;
;; Here we duplicate a small part of the the main loop. This is
;; because we need to run this code, but we want to shift a 1
;; into the accumulator instead of a 0.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
add ix, ix ; [15] quotient * 2
add hl, hl ; [11] remainder * 2
sbc hl, bc ; [15] remainder -= denominator
jr nc, _nc3 ; [12/7]
add hl, bc ; [11] remainder += denominator
jp _rotMul2 ; [10]
_nc3: inc ix ; [10] quotient + 1
_rotMul2: sl1 a ; [8]
jp _mainLoop ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; High byte was zero.
;; Go straight to low byte. Do not collect $200.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
_highZero: ld a, l ; [4] load low byte
ld hl, $0000 ; [10] initial remainder
or a ; [4]
ret z ; [11/5] return if multiplicand zero
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Search for first significant bit to avoid useless work. We also
;; set the low bit of A to act as a marker bit for the main loop.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
scf ; [4] for marker bit
_findLoBit: adc a, a ; [4] searching for first significant
jp nc, _findLoBit ; [10] bit to avoid useless work.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Main processing loop. We loop here for each significant bit
;; of the multiplicand (hence, small multiplicands are fast).
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
_mainLoop: jr nc, _loopTail ; [12/7]
add hl, de ; [11] remainder += numerator
sbc hl, bc ; [15] remainder -= denominator
jr nc, _nc2 ; [12/7]
add hl, bc ; [11] remainder += denominator
jp _loopTail ; [10]
_nc2: inc ix ; [10]
_loopTail: cp $80 ; [7] once only the marker bit is
ret z ; [10] left we end the loop
add ix, ix ; [15] quotient * 2
add hl, hl ; [11] remainder * 2
sbc hl, bc ; [15] remainder -= denominator
jr nc, _nc1 ; [12/7]
add hl, bc ; [11] remainder += denominator
jp _rotMulBit ; [10]
_nc1: inc ix ; [10] quotient + 1
_rotMulBit: add a, a ; [4] shift next bit and loop
jp _mainLoop ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of alUMulByFraction
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Last edited by qarnos on Fri 07 Dec, 2007 9:26 am, edited 1 time in total.
"I don't know why a refrigerator is now involved, but put that aside for now". - Jim e on unitedti.org
avatar courtesy of driesguldolf.
avatar courtesy of driesguldolf.
There's still a bunch of routines in the API website (http://api.timendus.com/) of which some could be useful, and others could get updated with versions in this thread. I'm not really working on the API anymore, and I'm not even sure if the current non-stable version will compile, but feel free to leech from it what you can use, or work on it to improve it
Please do ignore the idea box... Spam bots have reigned it for years
Please do ignore the idea box... Spam bots have reigned it for years
http://clap.timendus.com/ - The Calculator Link Alternative Protocol
http://api.timendus.com/ - Make your life easier, leave the coding to the API
http://vera.timendus.com/ - The calc lover's OS
http://api.timendus.com/ - Make your life easier, leave the coding to the API
http://vera.timendus.com/ - The calc lover's OS
Table-based sin/cos
Time for another routine
Here I am going to present some code for calculating table-based sines and cosines with a resolution of 1024 degrees to a circle, using a table of 256 or 512 bytes, depending on whether you want a 1 or 2 byte result.
Firstly, I will ignore the details of which fixed-point format you are using. It doesn't really matter. You fill the table with whatever representation you want.
Secondly, I am not including the actual tables here since the fixed-point format I am using in the 3D engine will be almost useless to anyone else. Just write some C code (or your language of choice) to generate the table If you desperately need a table and don't know how to make it, PM me.
Finally, this code is for a 512 byte table (2 byte results). You should be able to modify it easily enough if need be but if you have trouble then, once again, PM me.
This method of table lookup is usually referred to as the sine-quadrant method, or some variation of that name. It is called that because the table only contains the sine values for the first quadrant (90 degrees) of the sine wave. ie: sin(0)...sin(90). Well, not quite 90 degrees. The exact range of the table is sin(0)..sin(90.0 * 255.0 / 256.0).
Why only the first quadrant? Because the second quadrant is a mirror image of the first (sin(89) == sin(91)) and the last two quadrants are the negative of the first two. This means that, using a little logic, we only need to know the values of the first quadrant to calculate the values of the other 3.
Using a range of 256 for the quadrant also has an advantage for us: The high byte of the angle indicates which quadrant the angle lies in and the low bytes gives us a table index. The lowest bit of the high byte indicates if we need to mirror the table and the second bit tells us if we need to negate the result. The mirror is achieved by simply negating the low byte of the angle.
The only "gotcha!" is if we try to calculate sin(90) or sin(180) (sin(256) and sin(768) in our 1024-degree format). Both of these will result in zero, because the first value in the table (which these angles will look up) is zero! To get around this, we need to manually check for these values, but it is easier (and faster) than you might think.
Both these angles lie in a mirrored quadrant. To perform the mirror, we need to execute a NEG instruction on the low byte of the angle - which will set the zero flag if it is zero. Woohoo - free comparison! The best kind!
Since a cosine is just sine-wave offset by 90 degrees (256 in 1024-degree format), you can also use this code to calculate cos(x). Just increment the high byte of the angle by one and call the sin routine.
Enough rambling. Here's the code.
Here I am going to present some code for calculating table-based sines and cosines with a resolution of 1024 degrees to a circle, using a table of 256 or 512 bytes, depending on whether you want a 1 or 2 byte result.
Firstly, I will ignore the details of which fixed-point format you are using. It doesn't really matter. You fill the table with whatever representation you want.
Secondly, I am not including the actual tables here since the fixed-point format I am using in the 3D engine will be almost useless to anyone else. Just write some C code (or your language of choice) to generate the table If you desperately need a table and don't know how to make it, PM me.
Finally, this code is for a 512 byte table (2 byte results). You should be able to modify it easily enough if need be but if you have trouble then, once again, PM me.
This method of table lookup is usually referred to as the sine-quadrant method, or some variation of that name. It is called that because the table only contains the sine values for the first quadrant (90 degrees) of the sine wave. ie: sin(0)...sin(90). Well, not quite 90 degrees. The exact range of the table is sin(0)..sin(90.0 * 255.0 / 256.0).
Why only the first quadrant? Because the second quadrant is a mirror image of the first (sin(89) == sin(91)) and the last two quadrants are the negative of the first two. This means that, using a little logic, we only need to know the values of the first quadrant to calculate the values of the other 3.
Using a range of 256 for the quadrant also has an advantage for us: The high byte of the angle indicates which quadrant the angle lies in and the low bytes gives us a table index. The lowest bit of the high byte indicates if we need to mirror the table and the second bit tells us if we need to negate the result. The mirror is achieved by simply negating the low byte of the angle.
The only "gotcha!" is if we try to calculate sin(90) or sin(180) (sin(256) and sin(768) in our 1024-degree format). Both of these will result in zero, because the first value in the table (which these angles will look up) is zero! To get around this, we need to manually check for these values, but it is easier (and faster) than you might think.
Both these angles lie in a mirrored quadrant. To perform the mirror, we need to execute a NEG instruction on the low byte of the angle - which will set the zero flag if it is zero. Woohoo - free comparison! The best kind!
Since a cosine is just sine-wave offset by 90 degrees (256 in 1024-degree format), you can also use this code to calculate cos(x). Just increment the high byte of the angle by one and call the sin routine.
Enough rambling. Here's the code.
Code: Select all
;@doc:routine
;
; === alDPFixSin ===
;
; Returns sin(BC). BC is an integer angle with 1024 degrees to the circle.
;
; Since a sin(X) == cos(X + 90), this routine can be used to calculate a cosine
; by incrementing the high byte of the angle (B) by one before calling.
;
; INPUTS:
;
; REGISTERS:
; * BC - Angle to calculate sine of.
;
; OUTPUTS:
;
; REGISTERS
; * HL - sin(BC)
;
; DESTROYED:
;
; REGISTERS:
; * AF
;
;@doc:end
alDPFixSin:
; The high byte of the angle indicates which quadrant of the sine
; wave we are interested in indexing
ld a, c ; [4]
bit 0, b ; [8] if the angle is [256..511, 768..1023]
jr z, _noNegate ; [12/8] negate the low byte.
neg ; [8]
jr z, _forceTo1 ; [12/8]
_noNegate: ld l, a
ld h, g_alTrigTable ; [7]
ld a, (hl) ; [7]
inc h ; [4]
ld h, (hl) ; [7]
ld l, a ; [4]
bit 1, b ; [8]
ret z ; [?] negate result if angle >= 512
cpl ; [4]
ld l, a ; [4]
ld a, h ; [4]
cpl ; [4]
ld h, a ; [4]
inc hl ; [6]
ret ; [10]
; If the angle was 256 or 768, we manually force it to +/-1
_forceTo1: bit 1, b ; [8]
jr nz, _forceToN1 ; [12/8]
ld hl, 16384 ; [10] replace this with your value for +1
ret ; [10]
_forceToN1: ld hl, -16384 ; [10]
ret ; [10] replace this with your value for -1
"I don't know why a refrigerator is now involved, but put that aside for now". - Jim e on unitedti.org
avatar courtesy of driesguldolf.
avatar courtesy of driesguldolf.
Here's a routine (well, a small set of routines) to perform a cross-fade between two images.
It works by comparing the two images to see which bits need to be changed, and then randomly flips these bits until the image has been completed.
It's not as fast as I would like - an image which needs every pixel changed takes a few seconds - but it's good enough to be used in games for nice transitions between static screens.
There are 3 routines, but you only really need to know about one of them
Fairly self explanatory. HL must point to a buffer which matches the currently displayed screen or it will look all screwy. DE is the image you want to fade to, and IX is a buffer used for internal stuff. You can use PLOTSSCREEN, SAVESSCREEN and APPBACKUPSCREEN for these (as long as the calc doesn't APD during the fade - use DI to be sure).
The code uses irandom (the MirageOS version of ionRandom), but is easily replaced by any other RNG - see the source comments for details.
You can download the zip archive here. The only file you need is "xfade.asm". The rest of the stuff is for the demo, of which the screenshot is here:
It works by comparing the two images to see which bits need to be changed, and then randomly flips these bits until the image has been completed.
It's not as fast as I would like - an image which needs every pixel changed takes a few seconds - but it's good enough to be used in games for nice transitions between static screens.
There are 3 routines, but you only really need to know about one of them
Code: Select all
; === xFadeLCD ===
;
; Cross-fades the LCD between the current image (HL) and the image in (DE).
;
; INPUTS:
;
; REGISTERS
; * IX - Buffer for crossfade data (768 bytes)
; * HL - Buffer containing LCD contents (768 bytes)
; * DE - Target image (768 bytes)
The code uses irandom (the MirageOS version of ionRandom), but is easily replaced by any other RNG - see the source comments for details.
You can download the zip archive here. The only file you need is "xfade.asm". The rest of the stuff is for the demo, of which the screenshot is here:
"I don't know why a refrigerator is now involved, but put that aside for now". - Jim e on unitedti.org
avatar courtesy of driesguldolf.
avatar courtesy of driesguldolf.
-
- MCF Legend
- Posts: 1601
- Joined: Mon 20 Dec, 2004 8:45 am
- Location: Budapest, Absurdistan
- Contact:
Wouldn’t it be more efficient to do it like this:qarnos wrote:It's not as fast as I would like - an image which needs every pixel changed takes a couple of seconds - but it's good enough to be used in games for nice transitions between static screens.
Loop through the images. For each byte do the following:
1. XOR the corresponding bytes
2. if the result is zero, skip to the next byte, otherwise
3. AND the result with a random number
4. XOR this new number with the buffer
If we had to skip every byte in step 2, we’re done (we can just set a flag in step 3 to indicate the opposite). This way you can easily adjust the speed of the transition by tweaking the binary weight of the random numbers (which can be done with a 256-byte LUT too).
It probably would be. The main motivation for this method is I want to avoid the possibility of "getting stuck" if we go through a few loops and are unlucky enough to keep getting bad random numbers.CoBB wrote:Wouldn’t it be more efficient to do it like this:
I might give that a try though and see how it goes.
The other idea I had was to create a binary tree for finding the pixels. It would be lightning fast and actually use less ram than the current version, but probably too much effort to go to for such a simple throw-away effect.
update:
I tried it out, and my first efforts look a bit jerky:
I tried updating the screen as I went, but then you could see the refresh, so I had to resort to fastcopy after each frame. I could fix this to a degree (I am resetting the LCD co-ords after each pixel - very slow) but it's too late to try that right now.
I might keep playing around and see if I can make it any better. I think it really needs to be random-access, though.
"I don't know why a refrigerator is now involved, but put that aside for now". - Jim e on unitedti.org
avatar courtesy of driesguldolf.
avatar courtesy of driesguldolf.
- driesguldolf
- Extreme Poster
- Posts: 395
- Joined: Thu 17 May, 2007 4:49 pm
- Location: $4080
- Contact:
Looks very nice.
Downside is that if you use that a lot on a constant 'timing' you might have interference... Though in theory that should be easily fixed by adding nop instructions.
(Just for clarity: R increases on every instruction, right?)
Couldn't you simple use the refresh register?qarnos wrote:I found the reason why my routine is slow and the CoBB version is jerky - irandom is slow, slow, slooooooowww.
I'll play around with some other RNGs.
Downside is that if you use that a lot on a constant 'timing' you might have interference... Though in theory that should be easily fixed by adding nop instructions.
(Just for clarity: R increases on every instruction, right?)