[TI ASM] Useful routines.

Got questions? Got answers? Go here for both.

Moderator: MaxCoderz Staff

User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Post by qarnos »

Jim e wrote:I'd say it be faster than mine on long lines. Mine uses Bresenham line algorithm so its has to do some compares. Other than that the code looks strikingly similar. Actually mine started off using slope but I thought the over head from the div routine would take to long.

Edit: Couldn't that Fraction to fix 8 routine be optimised like this.

Code: Select all

FractionToFix8:                             .MODULE FractionToFix8
            add     a, a                    ; [4]
            jr      c, _overFlow            ; [12/7]
            cp      b                       ; [4]
            jr      nc, _overFlow           ; [12/7]
            sla     c                       ; [8]
            jp      nc, FractionToFix8      ; [10]
            ret                             ; [10]

_overFlow:  sub     b                       ; [4]
            sll     c                       ; [8]
            jp      nc, FractionToFix8      ; [10]
            ret                             ; [10]
Well, I'll be flabbernackled! (yes, I just made that up - and I stand by it).

I have this thing stuck in my head where I think it's always better to use SUB instead of CP if you might need the result afterwards. This case is obviously an exception. Well spotted! I never even noticed it!

EDIT: Actually, on closer observation, it looks like you have just moved the SUB B after the first JR to a different location. I think the performance benefits would be dependant on the operands... perhaps a case of six of one, half a dozen of the other?

I think I'll just go back to my beer now... :lol:

EDIT (AGAIN!): Just checked out your source and you are right:
  • the code is amazingly similar (especially the unrolled loop portion) and
  • I think my routine will be faster on long lines and yours will be faster on short lines - it's the overhead of the gradiant calculation that kills my routine.
User avatar
Jim e
Calc King
Posts: 2457
Joined: Sun 26 Dec, 2004 5:27 am
Location: SXIOPO = Infinite lives for both players
Contact:

Post by Jim e »

Yeah fractiontofix8 is probalby dependant on the operand. There is probably no speed gain, though still saved a byte :lol:
Image
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Post by qarnos »

Jim E and CoBB - I managed to come up with a way to time your interrupts disabled line drawing code (using PTI and the link port) and the results are in:

Jim E - 745 interrupts for 16384 lines.
CoBB - 964 interrupts.

My routine takes 784 interrupts.

The margin between my code and Jims is so slim that I'd probably use mine most of the time since I can then leave the interrupts running.

CoBBs code, on the otherhand, is much much smaller and amazingly fast for its size, so, if you need a small routine, use that - it is still around 4 times faster than the MirageOS routine.
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

qarnos wrote:Jim E and CoBB - I managed to come up with a way to time your interrupts disabled line drawing code (using PTI and the link port) and the results are in:
Note that there is a clock cycle counter in the debugger, which is reset after each frame, i. e. every 240000 ticks. You can simply press F8 to step over the call to the routine, and the counter will be advanced by the number of cc's the routine (including the call-ret) took.
User avatar
Jim e
Calc King
Posts: 2457
Joined: Sun 26 Dec, 2004 5:27 am
Location: SXIOPO = Infinite lives for both players
Contact:

Post by Jim e »

qarnos wrote:The margin between my code and Jims is so slim that I'd probably use mine most of the time since I can then leave the interrupts running.
I'd say that bigger advantage with yours is the fact that its easier to clip with the slope.

You should consider putting few of these routines on the wiki.

http://wikiti.denglend.net/index.php?ti ... 0_Routines

I had a few versions of a square root routine up, but the last one I believe I wrote on the spot of writing that article so its not to optimised or well thought out.
Image
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Post by qarnos »

CoBB wrote:Note that there is a clock cycle counter in the debugger, which is reset after each frame, i. e. every 240000 ticks. You can simply press F8 to step over the call to the routine, and the counter will be advanced by the number of cc's the routine (including the call-ret) took.
Trust me to find the complication solution first!

Actually, I had tried the F8 thing before but I have been using an older version of PTI. Downloaded the new one and it works great! Just one problem I noticed - if you set a breakpoint on a CALL, you can't step over it. It works ok if you set the breakpoint one instruction before the CALL, however.
Jim e wrote:I'd say that bigger advantage with yours is the fact that its easier to clip with the slope.

You should consider putting few of these routines on the wiki.

http://wikiti.denglend.net/index.php?ti ... 0_Routines

I had a few versions of a square root routine up, but the last one I believe I wrote on the spot of writing that article so its not to optimised or well thought out.
I've never given much thought to 2D clipping since I have never had a need for it and I think clipping with 1 byte co-ordinates will have limited usefulness anyway, but I see your point.

Good idea about WikiTI. I will try to get around to it sometime soon.

Nice job on the small square root routine! I've never been one to optimize for size unless there is also a speed advantage, which, unfortunately, is rarely the case.
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

qarnos wrote:Just one problem I noticed - if you set a breakpoint on a CALL, you can't step over it. It works ok if you set the breakpoint one instruction before the CALL, however.
Yes, it's because it is achieved by temporarily putting a breakpoint after the call and let it run. Of course all the other breakpoints are still in action.
User avatar
Jim e
Calc King
Posts: 2457
Joined: Sun 26 Dec, 2004 5:27 am
Location: SXIOPO = Infinite lives for both players
Contact:

Post by Jim e »

I just noticed something. z80bitshas its 16bit/8bit like this

Code: Select all

	add	hl,hl		; unroll 16 times
	rla			; ...
	cp	c		; ...
	jr	c,$+4		; ...
	sub	c		; ...
	inc	l		; ...
But if rla carries, that would fail.

It would need to be like this no?

Code: Select all

	add	hl,hl		; unroll 16 times
	rla			; ...
	jr c,$+5
	cp	c		; ...
	jr	c,$+4		; ...
	sub	c		; ...
	inc	l		; ...
well anyway, I was looking at that because I wanted a faster fraction to fix 8. This was my solution:

Code: Select all

;Input:
; a = Numerator
; b = Denominator
;
;Output;
; c = Result

	sla c          ; unroll 8 times
	add a,a        ;
	jr c,$+5       ;
	cp b           ;
	jr c,$+4       ;
	sub b          ;
	inc c          ;
Image
User avatar
silver calc
New Member
Posts: 73
Joined: Tue 28 Mar, 2006 10:50 pm
Location: Wouldn't you like to know?

Post by silver calc »

Here are several routines that rely on each other, but can also be called individually (note that I didn't write all of these routines, for example PutS and StrLength were written by TI). This is for all those who are programming for flash applications :twisted:
String Length

Code: Select all

;_StrLength inline
;
;inputs:	hl pointer to null-terminating string
;
;outputs:	bc=string length, not including null term
;		hl points to null term
;

StrLength
 ld c,0
StrLoop
 ld a,(hl)		;get character
 or a			;check if it's the end
 ret z			;if zero, return
 inc hl
 inc c			;increase counter
 jr StrLoop		;repeat
null-terminating String Length
note: could someone check my counting for the offsets table? I'm not sure if I got the count right on all of them.

Code: Select all

;Find the pixels needed to display a null-terminating string
;
;inputs: 	hl points to string in Ram
;
;outputs: 	b and c=pixels needed to display string
;

nStrLength
 push hl
 call StrLength		;get the string length
 pop hl
 
 push hl
 ld b,c			;set character counter
 ld c,0			;set pixel counter
 
nStrLengthLoop
 ld a,(hl)			;get character
 
 ld e,a			;put it into de so it can be added
 ld d,0
 
 inc hl			;point to next character and save string pointer
 push hl
 
 ld hl,charLengths		;point to offsets table
 add hl,de
 
 ld a,(hl)			;get offset
 add a,c			;add it to total length
 ld c,a			;swap 'a' and 'c'
 
 pop hl			;get back string pointer
 djnz nStrLengthLoop

 ld b,c
 pop hl			;get back string pointer to start of string
 
 ret

charLengths
 .db 3,6,4,4,4,4,6,6,4,4,4,4,3,4,5,5
 .db 4,5,4,4,5,5,4,5,6,5,4,4,5,6,4,4
 .db 1,2,4,6,6,4,5,2,3,3,6,4,3,4,2,4
 .db 4,4,4,4,4,4,4,4,4,4,2,3,4,4,4,4
 .db 6,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4
 .db 4,4,4,4,4,4,4,4,4,4,4,4,4,3,4,4
 .db 3,4,4,4,4,4,3,4,4,2,4,4,3,6,4,4
 .db 4,4,4,3,3,4,4,6,4,4,5,4,2,4,5,4
 .db 4,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5
 .db 5,5,5,5,4,4,4,4,4,4,4,4,4,4,4,4
 .db 4,4,6,6,6,6,6,6,6,6,4,4,4,4,4,4
 .db 4,4,4,4,5,5,3,3,4,4,2,5,4,5,6,4
 .db 4,3,4,5,6,5,5,5,5,6,6,4,4,4,4,4
 .db 3,4,3,4,4,4,4,4,4,4,4,4,4,4,5,3
 .db 4,7,5,7,6,7,6,7,6,7,6,7,5,4,6,6
 .db 6,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3
VPutS

Code: Select all

;VPutS inline
;
;inputs:	hl pointer to null-terminating string
;		(penRow) and (penCol) must be preset
;
;outputs:	hl points to null term of string
;		string displayed
VPutS:
 push af
 push de
 push ix
VPutS10:
 ld a,(hl) ; get a character of string name
 inc hl
 or a ; end of string?
 jr z, VputS20 ; yes --->
 bcall(_VPutMap) ; display one character of string
 jr nc, VPutS10 ; display rest of string IF FITS
VputS20:
 pop ix
 pop de
 pop af
 ret
VPutS_Center

Code: Select all

;Centers the string to the display.
;
;inputs: 	hl pointer to null-terminating string
;		(penRow) must be set before calling
;
;outputs: 	hl points to null term of the string
;		Text centered on string
;

VPutS_Center
 push hl
 call nStrLength
 pop hl

 ld a,96		;width of screen
 sub b			;subtract width of string
 rra			;divide by 2 to be centered
 ld (pencol),a

 ld (penCol),a	;display it
 call VPutS
 ret
Please "encourage" me to work more on Image any way you deem necessary
User avatar
Halifax
Sir Posts-A-Lot
Posts: 225
Joined: Mon 01 Jan, 2007 10:39 am
Location: Pennsylvania, US

Post by Halifax »

I just thought this was cool

Code: Select all

xor a = sub a,a
both of them take up the same tstates. 4
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

except it's

Code: Select all

 sub a
User avatar
calc84maniac
Regular Member
Posts: 112
Joined: Wed 18 Oct, 2006 7:34 pm
Location: The ex-planet Pluto
Contact:

Post by calc84maniac »

The Better CP HL,DE
compares hl to de, same flag outputs as 8-bit compare

Code: Select all

or a
sbc hl,de
add hl,de
~calc84maniac has spoken.

Projects:
F-Zero 83+
Project M (Super Mario for 83+)
User avatar
Halifax
Sir Posts-A-Lot
Posts: 225
Joined: Mon 01 Jan, 2007 10:39 am
Location: Pennsylvania, US

Post by Halifax »

King Harold wrote:except it's

Code: Select all

 sub a
Jim_e writes it as sub a,a.

Strcpy()

Code: Select all

;Strcpy()
;
;inputs:	hl pointer to null-terminated string
;		de pointer to buffer
;outputs:	copys string pointed to by hl to the memory in de
;
_strcpy:
	ld a,(hl)
	or a
	ret z
	ld (de),a
	inc hl
	inc de
	jr _strcpy
Strncpy()

Code: Select all

;Strncpy()
;
;inputs: hl pointer to null-terminated string
;	 de pointer to buffer
;	 b  number of characters to copy
;outputs: copys the number of characters from the string pointed to by hl into de
_strncpy:
	ld a,(hl)
	ld (de),a
	dec b
	jr nz,_strncpy
	ret
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

Jim_e writes it as sub a,a.
Well I had a look at his posts and I doubt that he does..
Even if he did it was probably a typo.

how about this?

Code: Select all

_strcpy:
   ld a,(hl)
   or a
   ret z
   ldi
   jr _strcpy
drawback: kills BC

or even:

Code: Select all

_strcpy:
   ldi
   ld a,(hl)
   or a
   jr nz, _strcpy
   ret
same drawback


and this:

Code: Select all

_strncpy:
   ld a,(hl)
   ld (de),a
   ;inc hl and de??
   djnz_strncpy
   ret 
User avatar
calc84maniac
Regular Member
Posts: 112
Joined: Wed 18 Oct, 2006 7:34 pm
Location: The ex-planet Pluto
Contact:

Post by calc84maniac »

Code: Select all

_strcpy:
   ldi
   inc bc
   ld a,(hl)
   or a
   jr nz, _strcpy
   ret
~calc84maniac has spoken.

Projects:
F-Zero 83+
Project M (Super Mario for 83+)
Post Reply