[TI ASM] Useful routines.

Got questions? Got answers? Go here for both.

Moderator: MaxCoderz Staff

King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

You could also PUSH and POP BC, that will be faster for long strings, inc BC would be faster for short ones
User avatar
Halifax
Sir Posts-A-Lot
Posts: 225
Joined: Mon 01 Jan, 2007 10:39 am
Location: Pennsylvania, US

Post by Halifax »

yeah sorry I must have blanked on those routines. Oh well I will type them up better next time.
Goplat
New Member
Posts: 12
Joined: Mon 16 Jul, 2007 2:46 pm

Post by Goplat »

strcpy is supposed to copy the terminating null too. And strncpy doesn't just copy the given number of bytes from src to dest, that's memcpy. If strncpy encounters the end of the source string, it fills the rest of the destination with zeros.

These should act more like their C counterparts:

Code: Select all

_strcpy:
        xor a
        cp (hl)
        ldi
        jr nz,_strcpy+1
        ret

_strncpy:
        ld c,b ; ensure that LDI doesn't change B
        xor a
        cp (hl)
        jr z,pad
        ldi
        djnz _strncpy+2
        ret
pad:    ld (de),a
        inc de
        djnz pad
        ret
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

remember it's not the Z flag but the P/V flag that LDI affects

Code: Select all

_strcpy:
        xor a
        cp (hl)
        ldi
        jp pe,_strcpy+1
        ret
I'm never really sure about pe/po though

edit: so it should be pe right?
Last edited by King Harold on Thu 16 Aug, 2007 6:55 pm, edited 1 time in total.
User avatar
calc84maniac
Regular Member
Posts: 112
Joined: Wed 18 Oct, 2006 7:34 pm
Location: The ex-planet Pluto
Contact:

Post by calc84maniac »

I think of PO as P0 (or zero) and that helps.
~calc84maniac has spoken.

Projects:
F-Zero 83+
Project M (Super Mario for 83+)
User avatar
driesguldolf
Extreme Poster
Posts: 395
Joined: Thu 17 May, 2007 4:49 pm
Location: $4080
Contact:

Post by driesguldolf »

Just something little I came up with, might be usefull in games:

Code: Select all

.module misc

;; === misc.fadestep ===
;;
;; Changes the contrast 1 closer to the specified value
;;
;; Pre:
;;   c = Contrast to fade to
;;
;; Post:
;;   c-flag = reset if (contrast)=c, set otherwise
;;   af, b destroyed
;;
;; SeeAlso:
;;   misc.fade
;;
;; Warning:
;;   Providing an invalid contrast causes random values to be send to the lcd.
;;   This error is caught if DEBUG is defined
;;

fadestep:
	ld a, (contrast)
	ld b, a
	cp c
	ret z		; c-flag is reset
	sbc a, a	; Determ the direction of fading
	or 1
	add a, b
	ld (contrast), a
	add a, %11011000
	ASSERT(nc)	; If carry is set then the contrast is invalid
	out ($10), a
	scf
	ret		; c-flag is set



;; === misc.fade ===
;;
;; Fades the contrast to the specified value
;;
;; Post:
;;   Contrast is faded to specified value
;;   Interrupts are enabled
;;   af, bc destroyed
;;
;; SeeAlso:
;;   misc.fadestep
;;
;; Warning:
;;   Speed depends on the isr timer, change the number of halts if needed
;;

fadelight:
	ld c, 0
	jr fade
fadedark:
	ld c, 39
fade:
	ei
	halt
	halt
	call fadestep
	jr c, fade
	ret

.endmodule
You saw that correctly, it uses the Vera protocol for defining routines :mrgreen:
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Multiplying by a fraction.

Post by qarnos »

This is a routine which many people may find useful. It's something I came up with for my 3D engine to fix the problem of polygons not being clipped all the way to the edges of the screen.

The problem was due to multiplying by a fixed-point reciprocal value to perform the line clipping: try representing 1/3 in fixed point - or floating point, for that matter. Problems like that, along with the limited (16 bit) precision resulted in the clipping not always being quite right. The best I could do was make sure the errors were biased to clip the lines to short, rather than too long - which would be bad.

I now have a much better (and in some cases, faster) solution to the problem and it could be used for many other applications.

This routine will multiply a 16-bit number by 16 bit/16 bit proper fraction. No reciprocals are involved, and it shortcuts the normal process of doing 16 bit * 16 bit = 32 bit multiplication followed by 32-bit / 16-bit divsion (eek!).

The routine can only handle fractional values up to 32767, although this can be overcome if you are willing to sacrifice some cycles. I don't actually need that ability, but I will make the modifications anyway and post this weekend when I have a bit more time.

The routine will perform HL * DE / BC. HL is unsigned - you will need to negate it before and after if you want a signed multiplication.

I hope someone find this useful. I am 99.998% confident it is bug free, but I don't have the Z80 time to test every possible combination of inputs!

I also haven't had a chance to calculate the timings, but will do so if time allows.

Code: Select all

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; alUMulByFraction:
;;
;;  HL = HL * DE / BC
;;
;;  This routine multiplies HL by proper fraction DE/BC.
;;
;;  HL is treated as unsigned.
;;  BC must be non-zero
;;  BC must be < 32768
;;  DE must be <= BC
;;
;; INPUTS:
;;
;;  HL  - multiplicand
;;  DE  - numerator
;;  BC  - denominator
;;
;; OUTPUTS:
;;  HL  - quotient of HL * DE / BC
;;  DE  - remainder of HL * DE / BC
;;
;; DESTROYED:
;;  AF
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
alUMulByFraction:

            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; This initial block of code is used to return the output in more
            ;; convenient registers. It may be removed, in which case the
            ;; output will be returned as followed:
            ;;
            ;;  IX  - quotient
            ;;  HL  - remainder            
            ;;  DE  - numerator (unchanged)
            ;;  BC  - denominator (unchanged)
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            push    ix                  ; [15] preserve IX
            call    _mainRtn            ; [17] run main code.
            ex      (sp), ix            ; [23] quotient <-> preserved IX
            ex      de, hl              ; [4] remainder <-> numerator
            pop     hl                  ; [10] pop quotient
            ret                         ; [10]
                        
            
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; The actual routine starts here.
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
_mainRtn:   ld      ix, $0000           ; [14] initial quotient
            ld      a, h                ; [4]
            or      a                   ; [4]
            jr      z, _highZero        ; [12/7] jump if high byte is zero.
            
            ld      h, l                ; [4] push contents of L to stack so
            push    hl                  ; [11] we can pop it into A later on.
            ld      hl, $0000           ; [10] initial remainder
            
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; Skip to first significant bit to avoid useless work. We also
            ;; set the low bit of A to act as a marker bit for the main loop.
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            scf                         ; [4] for marker bit
_findHiBit: adc     a, a                ; [4] searching for first significant
            jp      nc, _findHiBit      ; [10] bit to avoid useless work.
            call    _mainLoop           ; [*] process high byte.
            pop     af                  ; [10] pop low byte
            
            
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; Now process the low byte.
            ;;
            ;; Here we duplicate a small part of the the main loop. This is
            ;; because we need to run this code, but we want to shift a 1
            ;; into the accumulator instead of a 0.
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            add     ix, ix              ; [15] quotient * 2
            add     hl, hl              ; [11] remainder * 2
            sbc     hl, bc              ; [15] remainder -= denominator
            jr      nc, _nc3            ; [12/7]
            add     hl, bc              ; [11] remainder += denominator
            jp      _rotMul2            ; [10]
_nc3:       inc     ix                  ; [10] quotient + 1
_rotMul2:   sl1     a                   ; [8]
            jp      _mainLoop           ; [10]
            
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; High byte was zero.
            ;; Go straight to low byte. Do not collect $200.
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
_highZero:  ld      a, l                ; [4] load low byte
            ld      hl, $0000           ; [10] initial remainder
            or      a                   ; [4]
            ret     z                   ; [11/5] return if multiplicand zero
            
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; Search for first significant bit to avoid useless work. We also
            ;; set the low bit of A to act as a marker bit for the main loop.
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            scf                         ; [4] for marker bit
_findLoBit: adc     a, a                ; [4] searching for first significant
            jp      nc, _findLoBit      ; [10] bit to avoid useless work.


            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; Main processing loop. We loop here for each significant bit
            ;; of the multiplicand (hence, small multiplicands are fast).
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
_mainLoop:  jr      nc, _loopTail       ; [12/7]
            add     hl, de              ; [11] remainder += numerator
            sbc     hl, bc              ; [15] remainder -= denominator
            jr      nc, _nc2            ; [12/7]
            add     hl, bc              ; [11] remainder += denominator
            jp      _loopTail           ; [10]
_nc2:       inc     ix                  ; [10]            
_loopTail:  cp      $80                 ; [7] once only the marker bit is
            ret     z                   ; [10] left we end the loop
            add     ix, ix              ; [15] quotient * 2
            add     hl, hl              ; [11] remainder * 2
            sbc     hl, bc              ; [15] remainder -= denominator
            jr      nc, _nc1            ; [12/7]
            add     hl, bc              ; [11] remainder += denominator
            jp      _rotMulBit          ; [10]
_nc1:       inc     ix                  ; [10] quotient + 1
_rotMulBit: add     a, a                ; [4] shift next bit and loop
            jp      _mainLoop           ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of alUMulByFraction
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Last edited by qarnos on Fri 07 Dec, 2007 9:26 am, edited 1 time in total.
"I don't know why a refrigerator is now involved, but put that aside for now". - Jim e on unitedti.org

avatar courtesy of driesguldolf.
User avatar
Timendus
Calc King
Posts: 1729
Joined: Sun 23 Jan, 2005 12:37 am
Location: Netherlands
Contact:

Post by Timendus »

There's still a bunch of routines in the API website (http://api.timendus.com/) of which some could be useful, and others could get updated with versions in this thread. I'm not really working on the API anymore, and I'm not even sure if the current non-stable version will compile, but feel free to leech from it what you can use, or work on it to improve it ;)

Please do ignore the idea box... Spam bots have reigned it for years :)
http://clap.timendus.com/ - The Calculator Link Alternative Protocol
http://api.timendus.com/ - Make your life easier, leave the coding to the API
http://vera.timendus.com/ - The calc lover's OS
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Table-based sin/cos

Post by qarnos »

Time for another routine :)

Here I am going to present some code for calculating table-based sines and cosines with a resolution of 1024 degrees to a circle, using a table of 256 or 512 bytes, depending on whether you want a 1 or 2 byte result.

Firstly, I will ignore the details of which fixed-point format you are using. It doesn't really matter. You fill the table with whatever representation you want.

Secondly, I am not including the actual tables here since the fixed-point format I am using in the 3D engine will be almost useless to anyone else. Just write some C code (or your language of choice) to generate the table :) If you desperately need a table and don't know how to make it, PM me.

Finally, this code is for a 512 byte table (2 byte results). You should be able to modify it easily enough if need be but if you have trouble then, once again, PM me.

This method of table lookup is usually referred to as the sine-quadrant method, or some variation of that name. It is called that because the table only contains the sine values for the first quadrant (90 degrees) of the sine wave. ie: sin(0)...sin(90). Well, not quite 90 degrees. The exact range of the table is sin(0)..sin(90.0 * 255.0 / 256.0).

Why only the first quadrant? Because the second quadrant is a mirror image of the first (sin(89) == sin(91)) and the last two quadrants are the negative of the first two. This means that, using a little logic, we only need to know the values of the first quadrant to calculate the values of the other 3.

Using a range of 256 for the quadrant also has an advantage for us: The high byte of the angle indicates which quadrant the angle lies in and the low bytes gives us a table index. The lowest bit of the high byte indicates if we need to mirror the table and the second bit tells us if we need to negate the result. The mirror is achieved by simply negating the low byte of the angle.

The only "gotcha!" is if we try to calculate sin(90) or sin(180) (sin(256) and sin(768) in our 1024-degree format). Both of these will result in zero, because the first value in the table (which these angles will look up) is zero! To get around this, we need to manually check for these values, but it is easier (and faster) than you might think.

Both these angles lie in a mirrored quadrant. To perform the mirror, we need to execute a NEG instruction on the low byte of the angle - which will set the zero flag if it is zero. Woohoo - free comparison! The best kind!

Since a cosine is just sine-wave offset by 90 degrees (256 in 1024-degree format), you can also use this code to calculate cos(x). Just increment the high byte of the angle by one and call the sin routine.

Enough rambling. Here's the code.

Code: Select all


;@doc:routine
;
; === alDPFixSin ===
;
;  Returns sin(BC). BC is an integer angle with 1024 degrees to the circle.
;
;  Since a sin(X) == cos(X + 90), this routine can be used to calculate a cosine
;  by incrementing the high byte of the angle (B) by one before calling.
;
; INPUTS:
;
;  REGISTERS:
;  * BC - Angle to calculate sine of.
;
; OUTPUTS:
;
;  REGISTERS
;  * HL - sin(BC)
;
; DESTROYED:
;
;  REGISTERS:
;  * AF
;
;@doc:end

alDPFixSin:
            ; The high byte of the angle indicates which quadrant of the sine
            ; wave we are interested in indexing
            ld      a, c                ; [4]
            bit     0, b                ; [8] if the angle is [256..511, 768..1023]
            jr      z, _noNegate        ; [12/8] negate the low byte.
            neg                         ; [8]
            jr      z, _forceTo1        ; [12/8]

_noNegate:  ld      l, a
            ld      h, g_alTrigTable    ; [7]
            ld      a, (hl)             ; [7]
            inc     h                   ; [4]
            ld      h, (hl)             ; [7] 
            ld      l, a                ; [4]
            bit     1, b                ; [8]
            ret     z                   ; [?] negate result if angle >= 512
            cpl                         ; [4]
            ld      l, a                ; [4]
            ld      a, h                ; [4]
            cpl                         ; [4]
            ld      h, a                ; [4]
            inc     hl                  ; [6]
            ret                         ; [10]

            ; If the angle was 256 or 768, we manually force it to +/-1
_forceTo1:  bit     1, b                ; [8]
            jr      nz, _forceToN1      ; [12/8]
            ld      hl, 16384           ; [10] replace this with your value for +1
            ret                         ; [10]

_forceToN1: ld      hl, -16384          ; [10]
            ret                         ; [10] replace this with your value for -1
            

"I don't know why a refrigerator is now involved, but put that aside for now". - Jim e on unitedti.org

avatar courtesy of driesguldolf.
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Post by qarnos »

Here's a routine (well, a small set of routines) to perform a cross-fade between two images.

It works by comparing the two images to see which bits need to be changed, and then randomly flips these bits until the image has been completed.

It's not as fast as I would like - an image which needs every pixel changed takes a few seconds - but it's good enough to be used in games for nice transitions between static screens.

There are 3 routines, but you only really need to know about one of them :)

Code: Select all

; === xFadeLCD ===
;
;  Cross-fades the LCD between the current image (HL) and the image in (DE).
;
; INPUTS:
;
;  REGISTERS            
;  * IX - Buffer for crossfade data         (768 bytes)
;  * HL - Buffer containing LCD contents    (768 bytes)
;  * DE - Target image                      (768 bytes)
Fairly self explanatory. HL must point to a buffer which matches the currently displayed screen or it will look all screwy. DE is the image you want to fade to, and IX is a buffer used for internal stuff. You can use PLOTSSCREEN, SAVESSCREEN and APPBACKUPSCREEN for these (as long as the calc doesn't APD during the fade - use DI to be sure).

The code uses irandom (the MirageOS version of ionRandom), but is easily replaced by any other RNG - see the source comments for details.

You can download the zip archive here. The only file you need is "xfade.asm". The rest of the stuff is for the demo, of which the screenshot is here:

Image
"I don't know why a refrigerator is now involved, but put that aside for now". - Jim e on unitedti.org

avatar courtesy of driesguldolf.
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

qarnos wrote:It's not as fast as I would like - an image which needs every pixel changed takes a couple of seconds - but it's good enough to be used in games for nice transitions between static screens.
Wouldn’t it be more efficient to do it like this:

Loop through the images. For each byte do the following:
1. XOR the corresponding bytes
2. if the result is zero, skip to the next byte, otherwise
3. AND the result with a random number
4. XOR this new number with the buffer

If we had to skip every byte in step 2, we’re done (we can just set a flag in step 3 to indicate the opposite). This way you can easily adjust the speed of the transition by tweaking the binary weight of the random numbers (which can be done with a 256-byte LUT too).
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Post by qarnos »

CoBB wrote:Wouldn’t it be more efficient to do it like this:
It probably would be. The main motivation for this method is I want to avoid the possibility of "getting stuck" if we go through a few loops and are unlucky enough to keep getting bad random numbers.

I might give that a try though and see how it goes.

The other idea I had was to create a binary tree for finding the pixels. It would be lightning fast and actually use less ram than the current version, but probably too much effort to go to for such a simple throw-away effect.


update:

I tried it out, and my first efforts look a bit jerky:

Image

I tried updating the screen as I went, but then you could see the refresh, so I had to resort to fastcopy after each frame. I could fix this to a degree (I am resetting the LCD co-ords after each pixel - very slow) but it's too late to try that right now.

I might keep playing around and see if I can make it any better. I think it really needs to be random-access, though.
"I don't know why a refrigerator is now involved, but put that aside for now". - Jim e on unitedti.org

avatar courtesy of driesguldolf.
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Post by qarnos »

I found the reason why my routine is slow and the CoBB version is jerky - irandom is slow, slow, slooooooowww.

I'll play around with some other RNGs.
"I don't know why a refrigerator is now involved, but put that aside for now". - Jim e on unitedti.org

avatar courtesy of driesguldolf.
User avatar
benryves
Maxcoderz Staff
Posts: 3087
Joined: Thu 16 Dec, 2004 10:06 pm
Location: Croydon, England
Contact:

Post by benryves »

That looks like a very nice effect, and would add a lot of polish to a game. :)

Does using a simple function of R not look good?
User avatar
driesguldolf
Extreme Poster
Posts: 395
Joined: Thu 17 May, 2007 4:49 pm
Location: $4080
Contact:

Post by driesguldolf »

Looks very nice.
qarnos wrote:I found the reason why my routine is slow and the CoBB version is jerky - irandom is slow, slow, slooooooowww.

I'll play around with some other RNGs.
Couldn't you simple use the refresh register?

Downside is that if you use that a lot on a constant 'timing' you might have interference... Though in theory that should be easily fixed by adding nop instructions.

(Just for clarity: R increases on every instruction, right?)
Post Reply