[TI ASM] Useful routines.

Got questions? Got answers? Go here for both.

Moderator: MaxCoderz Staff

User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Square Root

Post by qarnos »

This routine, as the title suggests, will calculate the integer square root of the value in BC. This has been ripped straight from my 3D engine, so the naming reflects the naming used in the engine (every routine is prefixed by "al").

The value passed in A will determine how many bits of the result are calculated. For normal purposes, this should be 8, but you can get extra precission by using higher values. If you want the result as an 8.4 fixed point number, then you would pass 12 as the precision parameter.

UPDATE: I mentioned a possible bug when specifying 16 bits of precission. I checked it out, and removing the bug would really hamper the performance of the routine, so I decided to let it be.

If you really, really need an 8.8 fixed point result, then specify 15 bits of precission and double the result. There will be a margin of error in this case - but it avoids the 16 bit problem.

Passing a value lower than 8 as the precission parameter will result in a truncated answer.

In C code, this routine performs:

DE = int(sqrt(BC) / 256 * (2 ^ A));

Code: Select all

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; alSquareRootBC:
;;
;;  Calculates the integer square root of BC. This is the highest number, X,
;;  which satisfies: X * X <= BC.
;;
;; INPUTS:
;;   A  - The number of bits of precision required in the result.
;;  BC  - The number to obtain the square root of.
;;
;; OUTPUTS:
;;  DE  - Integer square root of BC
;;
;; DESTROYED:
;;  AF, BC, HL
;;
;; TIMINGS (not including CALL):
;;  44 + (133 * A) T-States
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
alSquareRootBC:                                     .MODULE alSquareRootBC
            ld      hl, $0000       ; [10]
            ld      d, h            ; [4]
            ld      e, l            ; [4]
_mainLoop:  sla     c               ; [8]
            rl      b               ; [8]
            adc     hl, hl          ; [15]
            sla     c               ; [8]
            rl      b               ; [8]
            adc     hl, hl          ; [15]
            sll     e               ; [8] root = root * 2 + 1
            rl      d               ; [8] should never carry
            sbc     hl, de          ; [15] remainder -= root
            jp      c, _revertHL    ; [10]
            inc     de              ; [6]
            jp      _loopTail       ; [10]
_revertHL:  add     hl, de          ; [11]
            dec     e               ; [4] root -= 1 (low bit always set).
_loopTail:  dec     a               ; [4]
            jp      nz, _mainLoop   ; [10]
            srl     d               ; [8]
            rr      e               ; [8]
            ret                     ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of alSquareRootBC
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
One final note, be aware this code uses the TASM .MODULE directive to define the scope of local labels.

To reset the scope, you will need to add a blank .MODULE after the routine. Otherwise, the TI-OS routine names, which all start with an underscore, may not be in scope for the rest of your program.
Last edited by qarnos on Fri 18 Aug, 2006 9:11 am, edited 1 time in total.
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Unsigned division.

Post by qarnos »

Here is a routine to perform unsigned 16-bit / 16-bit division.

It has been a long time since I wrote this routine, so it could quite possibly be optimised a bit further.

C code: HL = int(HL / BC);

Code: Select all

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; alUDivHLBC
;;
;;  Unsigned divide of HL by BC. Returns the integer (quotient) and remainder
;;  of the division.
;;
;; INPUTS:
;;  HL - Numerator
;;  BC - Denominator
;;
;; OUTPUTS:
;;  HL - Quotient of (HL/BC)
;;  DE - Remainder of (HL/BC)
;;  BC - Denominator (unchanged)
;;
;; DESTROYED:
;;  AF
;;
;; TIMINGS (not including CALL):
;;  Best-case (small numerators):       518 T-states
;;  Worst-case (large numerators):      1684 T-states
;; 
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
alUDivHLBC:                                     .MODULE alUDivHLBC
        ld      de, 0           ; [10] Zero the initial remainder (DE) and
        ld      a, 17           ; [7] set up loop counter (A).
_shlHL: dec     a               ; [4] This initial loop shifts HL left until
        add     hl, hl          ; [11] the high bit pops into the carry.
        jp      nc, _shlHL      ; [10]
        jp      _chkQ           ; [10]
_divLp: ex      de, hl          ; [4]
        adc     hl, hl          ; [15]
_chkQ:  ex      de, hl          ; [4]
        adc     hl, hl          ; [15] Shift bit into remainder
        dec     a               ; [4]
        jp      z, _endDv       ; [10]
        sbc     hl, bc          ; [15]
        ccf                     ; [4] 
        jp      c, _divLp       ; [10]
_addBC: add     hl, bc          ; [11]
        or      a               ; [4]
        jp      _divLp          ; [10]
_endDv: sbc     hl, bc          ; [15]
        jp      nc, _endW1      ; [10]
        add     hl, bc          ; [11]
_endW1: ex      de, hl          ; [4]
        ccf                     ; [4]
        adc     hl, hl          ; [15]
        ret                     ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of alUDivHLBC
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Post by qarnos »

Ok, I've already decided to break my one-routine per post rule :D

Here are a few routines which form part of some debugging code I am putting together.

These routines deal with displaying hexidecimal numbers and the flags register.

A quick overview:

* dbgNibbleToHex - Converts a 4-bit value in the accumulator to the ASCII code of the hexidecimal representation.
* dbgPrintFlags - Displays the contents of F in an easy to read format.
* dbgPrintHexByte - Displays a byte in hexidecimal format.
* dbgPrintHexWord - Displays a word in hexidecimal format.

Code: Select all

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; dbgPrintHexByte
;; dbgPrintHexWord
;;
;;  Uses _VPutMap to display a byte or word value in hexidecimal.
;;
;; INPUTS:
;;  HL  - Address of byte/word to display
;;
;; DESTROYED:
;;  AF, DE
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
dbgPrintHexWord:
            inc     hl                  ; MSB first.
            call    dbgPrintHexByte
            dec     hl
dbgPrintHexByte:
            xor     a
            rld
            push    af
            call    dbgNibbleToHex
            BCALL   _VPutMap
            pop     af
            rld
            push    af
            call    dbgNibbleToHex
            BCALL   _VPutMap
            pop     af
            rld
            ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of dbgPrintByte / dbgPrintWord
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; dbgPrintFlags:
;;            
;;  Uses _VPutMap to display the contents of the flags register.
;;  The output is formatted SZ5H3PNC with a dash (-) in place of the flag
;;  symbol if the corresponding flag is not set.
;;
;; INPUTS:
;;  F   - Well...
;;
;; DESTROYED:
;;  AF, BC, DE
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
dbgPrintFlags:
            push    af
            pop     bc
            ld      b, 'S' - 45         ; 45 is the ASCII code for a dash.
            call    _dbgPutFlg
            ld      b, 'Z' - 45
            call    _dbgPutFlg
            ld      b, '5' - 45
            call    _dbgPutFlg
            ld      b, 'H' - 45
            call    _dbgPutFlg
            ld      b, '3' - 45
            call    _dbgPutFlg
            ld      b, 'P' - 45
            call    _dbgPutFlg
            ld      b, 'N' - 45
            call    _dbgPutFlg
            ld      b, 'C' - 45
_dbgPutFlg: sla     c
            sbc     a, a                ; A = 255 if [sla c] carried.
            and     b                   ; AND with (character - 45)
            add     a, 45               ; then add 45.
            BCALL   _VPutMap
            ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of dbgPrintFlags
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            
        
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; dbgNibbleToHex:
;;
;;  Converts a 4-bit value to the ASCII representation of that number in
;;  hexadecimal format.
;;
;; INPUTS:
;;  A   - Number to transform (low 4 bits).
;;
;; OUTPUTS:
;;  A   - ASCII hex code.
;;
;; DESTROYED:
;;  F
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
dbgNibbleToHex:
        and     $0f             ; mask the low 4 bits to be sure.
        add     a, 48           ; ASCII: 48-57 = numerals [0..9] 
        cp      58              ; if A was > 9 then add 7 to get
        ret     c               ; the ASCII codes for letters A-F.
        add     a, 7
        ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of dbgNibbleToHex
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Debugging code

Post by qarnos »

Expanding on my last post, I have completed writing a simple ASSERT library.

For those unfarmiliar with C/C++, ASSERT is a macro which is used during debugging to test the validity of an assumption made by the programmer. If the ASSERT fails, then the program is aborted and the debugger will identify the exact place where things went awry.

This library implements a version of the ASSERT macro. This can be placed at any point in your program and takes just one argument - the condition to test. This can be any of the conditions used in a JP instruction (c, nc, z, nz, etc). If the test FAILS the program is terminated with a register dump.

The square root routine I provided a few posts ago is a good example of when to use ASSERT. Take a look at this fragment:

Code: Select all

            sll     e               ; [8] root = root * 2 + 1
            rl      d               ; [8] should never carry
            sbc     hl, de          ; [15] remainder -= root
Here, I have made the assumption that RL D will never set the carry flag, which is crucial for the following SBC instruction. I could now "enforce" that assumption:

Code: Select all

            sll     e               ; [8] root = root * 2 + 1
            rl      d               ; [8] should never carry
            ASSERT(nc)
            sbc     hl, de          ; [15] remainder -= root
The macro effectively says, "I am assuming that the carry flag will be reset at this point. If it is not, then that is really bad and I would like to know about it, so end the program".

If the ASSERT does fail, I would get a screen looking something like this:

Image

One important thing to note, is that the ASSERT macros only have an effect when DEBUG is defined. If it is undefined, the ASSERTs expand to nothing!

The two files you will need are debug.asm and debug.inc.

NOTE: There files were written for the TI-83+. They may need modification for other calcs.

Since the are no instructions, apart from the source code comments, I will give a crappy example here.

Code: Select all

#define DEBUG               ; do this *before* including debug.inc

#include "ion.inc"
#include "debug.inc"


; etc...

ionBegin:

        DEBUG_BEGIN_PROGRAM ; DO THIS FIRST!!!!!!!!!!!!!!!

        add   a, c
        ASSERT(nc)          ; make sure a + c did not carry
        
        
; etc...


putPixel:                   ; co-ordinates in B and C


#ifdef DEBUG

        ld      a, 95
        cp      b
        ASSERT(nc)         ; make sure B <= 95
        
        ld      a, 63
        cp      c          ; make sure C <= 63
        ASSERT(nc)

#endif

; etc...




#include "debug.asm"        ; include debug code
One final warning - ION.INC doesn't include all the TIOS routines. Some which I needed were missing so I put them at the top of debug.asm. If you are using a different header, they might already be define so you will need to remove them.

Also, the equates I put in were the 83+ versions only.
Last edited by qarnos on Wed 30 Apr, 2008 10:11 am, edited 1 time in total.
Andy_J
Calc Master
Posts: 1110
Joined: Mon 20 Dec, 2004 10:01 pm
Location: In the state of Roo Fearing
Contact:

Post by Andy_J »

That's actually pretty damn slick... I might have to try it out if I ever get off my lazy bum and start programming again.
ImageImage
Image
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Fractions to fixed point values.

Post by qarnos »

Here is a handy piece of code which can convert a proper fractional value (such as 3/7, 5/9, etc) into a fixed point representation of that value.

This can be used, for instance, to convert the remainder of a division into something more useful.

The number of bits calculated can be defined to suit whatever purpose you need it for.

The longest possible run time (when calculating a full 16 bit result) is around 1400 T-States.

NOTE: This code uses the undocument Z80 instruction SL1. Unfortunately, most assemblers get this wrong and use the mnemonic SLL instead (due to confusion between a lowercase L and the numeral 1). This code gives in and uses the SLL mnemonic. If using TASM, remember to specify the -x5 switch to enable all the undocumented instructions.

If you get any "unkown instruction" errors, try changing SLL to SL1

Code: Select all

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; FractionToFix16:
;;
;;  Converts a proper fraction in the form HL / DE into a 0.x fixed point
;;  representation where x is defined by the caller.
;;
;;  Note that this routine will only provide correct results for proper
;;  fractions (denominator > numerator). Anything else will return with
;;  each bit of the result set.
;;
;;  The value passed in BC determines the number of bits of accuracy
;;  that are calculated and should equal (1 << (16 - nBits)).
;;
;;  For 16 bits:    $0001
;;  For 8 bits:     $0100
;;
;; INPUTS:
;;  HL - Numerator
;;  DE - Denominator
;;  BC - Accuracy (1 << (16 - nBits))
;;
;; OUTPUTS:
;;  BC - HL / DE
;;
;; DESTROYED:
;;  F, HL
;;
;; TIMINGS (not including CALL):
;;  Best case:  10 + (68 * nBits) T-States
;;  Worst case: 10 + (87 * nBits) T-States
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
FractionToFix16:                                .MODULE FractionToFix16
            add     hl, hl                      ; [11]
            jr      c, _overFlow                ; [12/7]
            sbc     hl, de                      ; [15]
            jp      nc, _shift1                 ; [12/7]
            add     hl, de                      ; [11]
            sla     c                           ; [8]
            rl      b                           ; [8]
            jp      nc, FractionToFix16         ; [10]
            ret                                 ; [10]
_overFlow:  or      a                           ; [4]
            sbc     hl, de                      ; [15]
_shift1:    sll     c                           ; [8]
            rl      b                           ; [8]
            jp      nc, FractionToFix16         ; [10]
            ret                                 ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of FractionToFix16
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[quote][/quote]
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Post by qarnos »

This is just an 8-bit version of the above routine.

The same warnings about SLL apply.

Code: Select all

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; FractionToFix8:
;;
;;  An 8-bit version of FractionToFix16
;;
;; INPUTS:
;;  A - Numerator
;;  B - Denominator
;;  C - Accuracy
;;
;; OUTPUTS:
;;  C - 0.x fixed point representation of A / B
;;
;; DESTROYED:
;;  AF
;;
;; TIMINGS (not including CALL):
;;  Best case:  10 + (38 * nBits) T-States
;;  Worst case: 10 + (45 * nBits) T-States
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
FractionToFix8:                             .MODULE FractionToFix8
            add     a, a                    ; [4]
            jr      c, _overFlow            ; [12/7]
            sub     b                       ; [4]
            jr      nc, _shift1             ; [12/7]
            add     a, b                    ; [4] 
            sla     c                       ; [8]
            jp      nc, FractionToFix8      ; [10]
            ret                             ; [10]
_overFlow:  sub     b                       ; [4]
_shift1:    sll     c                       ; [8]
            jp      nc, FractionToFix8      ; [10]
            ret                             ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of FractionToFix8
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Fast 8-bit multiplication: unsigned x unsigned

Post by qarnos »

I have been debating whether or not to post these routines, since I am unhappy with the way the signed variants work, but I decided to post them now and I can always update them if neccessary.

The next few posts will detail some very fast 8-bit x 8-bit multiplication routines. These routines all work using the table of squares method. For those unfamiliar with this approach, here is a quick overview:

We want to calculate A * B. First, we define a function, f, which is defined as f(x) = x * x / 4

Now behold! f(A + B) - f(A - B) = A * B

So, we can precalculate f(x) for the range we will required and store this data in a lookup table (LUT).

For two unsigned 8-bit numbers, the range of values we need to know the value of f(x) for is limited to -255 (0 - 255) to 510 (255 + 255). Since f(510) = 65025, we will need 2 bytes for each entry in the LUT, requiring 1024 bytes of LUT data. This may be excessive for some applications - it depends on whether you can afford the RAM.

Furthermore, this table will need to be aligned to a 256 byte boundary to achieve maximum speed.

So, first, I will give you the table:

EDIT: I also have some code to generate this table at runtime. If you would like this code, PM me. Note that the code doesn't handle the memory allocation - it just generates the table where you tell it to.

Code: Select all

; This macro allows you to align data to any byte boundary
#define ALIGN(x)       .org    (($ + (x - 1)) - (($ + (x - 1)) % x))


ALIGN(256)

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; g_alUMulTable
;;
;; This table is used by various routines to perform multiplications using
;; the formula:
;;
;;    f(a+b) - f(a-b) = a * b
;;
;;    where:
;;
;;    f(x) = x * x / 4
;;
;; This table contains f(x) for all numbers in the range [0..511] inclusive
;; in 4 256-byte chunks in the following (strange, but optimal) order:
;;    0   - 255    low byte of f(256..511)
;;    256 - 511    high byte of f(256..511)
;;    512 - 767    low byte of f(0..255)
;;    768 - 1023   high byte of f(0..255)
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
g_alUMulTable    .equ    $ >> 8
    .db $00,$80,$01,$82,$04,$86,$09,$8c,$10,$94,$19,$9e,$24,$aa,$31,$b8
    .db $40,$c8,$51,$da,$64,$ee,$79,$04,$90,$1c,$a9,$36,$c4,$52,$e1,$70
    .db $00,$90,$21,$b2,$44,$d6,$69,$fc,$90,$24,$b9,$4e,$e4,$7a,$11,$a8
    .db $40,$d8,$71,$0a,$a4,$3e,$d9,$74,$10,$ac,$49,$e6,$84,$22,$c1,$60
    .db $00,$a0,$41,$e2,$84,$26,$c9,$6c,$10,$b4,$59,$fe,$a4,$4a,$f1,$98
    .db $40,$e8,$91,$3a,$e4,$8e,$39,$e4,$90,$3c,$e9,$96,$44,$f2,$a1,$50
    .db $00,$b0,$61,$12,$c4,$76,$29,$dc,$90,$44,$f9,$ae,$64,$1a,$d1,$88
    .db $40,$f8,$b1,$6a,$24,$de,$99,$54,$10,$cc,$89,$46,$04,$c2,$81,$40
    .db $00,$c0,$81,$42,$04,$c6,$89,$4c,$10,$d4,$99,$5e,$24,$ea,$b1,$78
    .db $40,$08,$d1,$9a,$64,$2e,$f9,$c4,$90,$5c,$29,$f6,$c4,$92,$61,$30
    .db $00,$d0,$a1,$72,$44,$16,$e9,$bc,$90,$64,$39,$0e,$e4,$ba,$91,$68
    .db $40,$18,$f1,$ca,$a4,$7e,$59,$34,$10,$ec,$c9,$a6,$84,$62,$41,$20
    .db $00,$e0,$c1,$a2,$84,$66,$49,$2c,$10,$f4,$d9,$be,$a4,$8a,$71,$58
    .db $40,$28,$11,$fa,$e4,$ce,$b9,$a4,$90,$7c,$69,$56,$44,$32,$21,$10
    .db $00,$f0,$e1,$d2,$c4,$b6,$a9,$9c,$90,$84,$79,$6e,$64,$5a,$51,$48
    .db $40,$38,$31,$2a,$24,$1e,$19,$14,$10,$0c,$09,$06,$04,$02,$01,$00

    .db $40,$40,$41,$41,$42,$42,$43,$43,$44,$44,$45,$45,$46,$46,$47,$47
    .db $48,$48,$49,$49,$4a,$4a,$4b,$4c,$4c,$4d,$4d,$4e,$4e,$4f,$4f,$50
    .db $51,$51,$52,$52,$53,$53,$54,$54,$55,$56,$56,$57,$57,$58,$59,$59
    .db $5a,$5a,$5b,$5c,$5c,$5d,$5d,$5e,$5f,$5f,$60,$60,$61,$62,$62,$63
    .db $64,$64,$65,$65,$66,$67,$67,$68,$69,$69,$6a,$6a,$6b,$6c,$6c,$6d
    .db $6e,$6e,$6f,$70,$70,$71,$72,$72,$73,$74,$74,$75,$76,$76,$77,$78
    .db $79,$79,$7a,$7b,$7b,$7c,$7d,$7d,$7e,$7f,$7f,$80,$81,$82,$82,$83
    .db $84,$84,$85,$86,$87,$87,$88,$89,$8a,$8a,$8b,$8c,$8d,$8d,$8e,$8f
    .db $90,$90,$91,$92,$93,$93,$94,$95,$96,$96,$97,$98,$99,$99,$9a,$9b
    .db $9c,$9d,$9d,$9e,$9f,$a0,$a0,$a1,$a2,$a3,$a4,$a4,$a5,$a6,$a7,$a8
    .db $a9,$a9,$aa,$ab,$ac,$ad,$ad,$ae,$af,$b0,$b1,$b2,$b2,$b3,$b4,$b5
    .db $b6,$b7,$b7,$b8,$b9,$ba,$bb,$bc,$bd,$bd,$be,$bf,$c0,$c1,$c2,$c3
    .db $c4,$c4,$c5,$c6,$c7,$c8,$c9,$ca,$cb,$cb,$cc,$cd,$ce,$cf,$d0,$d1
    .db $d2,$d3,$d4,$d4,$d5,$d6,$d7,$d8,$d9,$da,$db,$dc,$dd,$de,$df,$e0
    .db $e1,$e1,$e2,$e3,$e4,$e5,$e6,$e7,$e8,$e9,$ea,$eb,$ec,$ed,$ee,$ef
    .db $f0,$f1,$f2,$f3,$f4,$f5,$f6,$f7,$f8,$f9,$fa,$fb,$fc,$fd,$fe,$ff

    .db $00,$00,$01,$02,$04,$06,$09,$0c,$10,$14,$19,$1e,$24,$2a,$31,$38
    .db $40,$48,$51,$5a,$64,$6e,$79,$84,$90,$9c,$a9,$b6,$c4,$d2,$e1,$f0
    .db $00,$10,$21,$32,$44,$56,$69,$7c,$90,$a4,$b9,$ce,$e4,$fa,$11,$28
    .db $40,$58,$71,$8a,$a4,$be,$d9,$f4,$10,$2c,$49,$66,$84,$a2,$c1,$e0
    .db $00,$20,$41,$62,$84,$a6,$c9,$ec,$10,$34,$59,$7e,$a4,$ca,$f1,$18
    .db $40,$68,$91,$ba,$e4,$0e,$39,$64,$90,$bc,$e9,$16,$44,$72,$a1,$d0
    .db $00,$30,$61,$92,$c4,$f6,$29,$5c,$90,$c4,$f9,$2e,$64,$9a,$d1,$08
    .db $40,$78,$b1,$ea,$24,$5e,$99,$d4,$10,$4c,$89,$c6,$04,$42,$81,$c0
    .db $00,$40,$81,$c2,$04,$46,$89,$cc,$10,$54,$99,$de,$24,$6a,$b1,$f8
    .db $40,$88,$d1,$1a,$64,$ae,$f9,$44,$90,$dc,$29,$76,$c4,$12,$61,$b0
    .db $00,$50,$a1,$f2,$44,$96,$e9,$3c,$90,$e4,$39,$8e,$e4,$3a,$91,$e8
    .db $40,$98,$f1,$4a,$a4,$fe,$59,$b4,$10,$6c,$c9,$26,$84,$e2,$41,$a0
    .db $00,$60,$c1,$22,$84,$e6,$49,$ac,$10,$74,$d9,$3e,$a4,$0a,$71,$d8
    .db $40,$a8,$11,$7a,$e4,$4e,$b9,$24,$90,$fc,$69,$d6,$44,$b2,$21,$90
    .db $00,$70,$e1,$52,$c4,$36,$a9,$1c,$90,$04,$79,$ee,$64,$da,$51,$c8
    .db $40,$b8,$31,$aa,$24,$9e,$19,$94,$10,$8c,$09,$86,$04,$82,$01,$80

    .db $00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00
    .db $00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00
    .db $01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$02,$02
    .db $02,$02,$02,$02,$02,$02,$02,$02,$03,$03,$03,$03,$03,$03,$03,$03
    .db $04,$04,$04,$04,$04,$04,$04,$04,$05,$05,$05,$05,$05,$05,$05,$06
    .db $06,$06,$06,$06,$06,$07,$07,$07,$07,$07,$07,$08,$08,$08,$08,$08
    .db $09,$09,$09,$09,$09,$09,$0a,$0a,$0a,$0a,$0a,$0b,$0b,$0b,$0b,$0c
    .db $0c,$0c,$0c,$0c,$0d,$0d,$0d,$0d,$0e,$0e,$0e,$0e,$0f,$0f,$0f,$0f
    .db $10,$10,$10,$10,$11,$11,$11,$11,$12,$12,$12,$12,$13,$13,$13,$13
    .db $14,$14,$14,$15,$15,$15,$15,$16,$16,$16,$17,$17,$17,$18,$18,$18
    .db $19,$19,$19,$19,$1a,$1a,$1a,$1b,$1b,$1b,$1c,$1c,$1c,$1d,$1d,$1d
    .db $1e,$1e,$1e,$1f,$1f,$1f,$20,$20,$21,$21,$21,$22,$22,$22,$23,$23
    .db $24,$24,$24,$25,$25,$25,$26,$26,$27,$27,$27,$28,$28,$29,$29,$29
    .db $2a,$2a,$2b,$2b,$2b,$2c,$2c,$2d,$2d,$2d,$2e,$2e,$2f,$2f,$30,$30
    .db $31,$31,$31,$32,$32,$33,$33,$34,$34,$35,$35,$35,$36,$36,$37,$37
    .db $38,$38,$39,$39,$3a,$3a,$3b,$3b,$3c,$3c,$3d,$3d,$3e,$3e,$3f,$3f
And now the code that uses this table to perform an unsigned x unsigned multiplication:

Code: Select all

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; alUMulBC:
;;
;; Unsigned multiply of B and C, storing the 16-bit result in BC
;;
;; INPUTS:
;;  B   - Multiplicand
;;  C   - Multiplier
;;
;; OUTPUTS:
;;  BC  - B * C
;;
;; DESTROYED:
;;  AF, HL
;;
;; TIMINGS (not including CALL):
;;  126 T-States
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
alUMulBC:
        ld      a, b                    ; [4] calculate B + C.
        add     a, c                    ; [4]
        ld      l, a                    ; [4] if carry flag is set we use
        sbc     a, a                    ; [4] g_alUMulTable otherwise we use
        sbc     a, 254 - g_alUMulTable  ; [7] g_alUMulTable + 2.
        ld      h, a                    ; [4]
        ld      a, b                    ; [4] calculate B - C.
        sub     c                       ; [4]
        jp      nc, $ + $0005           ; [10] negate result if < 0
        neg                             ; [8]
        ld      c, (hl)                 ; [7] retrieve g_alUMulTable[A+B].
        inc     h                       ; [4]
        ld      b, (hl)                 ; [7] 
        ld      l, a                    ; [4] Select next table
        ld      h, g_alUMulTable + 2    ; [7]
        ld      a, c                    ; [4] subtract Table[B-C] 
        sub     (hl)                    ; [7] from Table[B+C]
        ld      c, a                    ; [4]
        ld      a, b                    ; [4]
        inc     h                       ; [4]
        sbc     a, (hl)                 ; [7]
        ld      b, a                    ; [4]
        ret                             ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of alUMulBC
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Large, but fast!
Last edited by qarnos on Sat 26 Aug, 2006 10:11 am, edited 2 times in total.
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Fast 8-bit multiplication: signed x signed

Post by qarnos »

I am not entirely happy with this code, but this will use the above lookup table to perform signed * signed multiplication.

Code: Select all

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; alSMulBC:
;;
;; Signed multiply of B and C, storing the 16-bit result in BC
;;
;; INPUTS:
;;  B   - Multiplicand (-128..127)
;;  C   - Multiplier   (-128..127)
;;
;; OUTPUTS:
;;  BC  - B * C
;;
;; DESTROYED:
;;  AF, HL
;;
;; TIMINGS (not including CALL):
;;  173 T-states
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
alSMulBC:
        ld      a, b                    ; [4] calculate B + C.
        add     a, c                    ; [4]
        ld      l, a                    ; [4]
        sbc     a, a                    ; [4] A = 255 if CA, 0 if NC
        xor     b                       ; [4]
        xor     c                       ; [4] if B and C have equal signs
        add     a, a                    ; [4] we negate if B + C carries.
        sbc     a, a                    ; [4] if B and C have opposing signs we
        add     a, l                    ; [4]
        ld      l, a                    ; [4]
        sbc     a, a                    ; [4]
        xor     l                       ; [4] negate if B + C did not carry.
        ld      l, a                    ; [4]

        ld      a, b                    ; [4] calculate B - C.
        sub     c                       ; [4]
        ld      h, a                    ; [4]
        sbc     a, a                    ; [4] A = 255 if CA, 0 if NC
        xor     b                       ; [4]
        xor     c                       ; [4] if B and C have equal signs
        add     a, a                    ; [4] we negate if B + C carries.
        sbc     a, a                    ; [4] if B and C have opposing signs we
        add     a, h                    ; [4]
        ld      h, a                    ; [4]
        sbc     a, a                    ; [4]
        xor     h                       ; [4] negate id B + C did not carry.

        ld      h, g_alUMulTable + 3    ; [7]
        ld      b, (hl)                 ; [7] retrieve g_alUMulTable[A+B].
        dec     h                       ; [4]
        ld      c, (hl)                 ; [7] 
        ld      l, a                    ; [4] Select next table
        ld      a, c                    ; [4] subtract Table[B-C] 
        sub     (hl)                    ; [7] from Table[B+C]
        ld      c, a                    ; [4]
        ld      a, b                    ; [4]
        inc     h                       ; [4]
        sbc     a, (hl)                 ; [7]
        ld      b, a                    ; [4]
        ret                             ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of alSMulBC
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Fast 8-bit multiplication: signed x unsigned

Post by qarnos »

Finally, signed * unsigned multiplication.

This routine, in my experience, is mainly useful for mutliplications dealing with fixed point multiplications. For instance, if you want to scale a signed number by a certain amount, you can use this routine and then shift the result as required.

This routine is also dependant on the unsigned rountine - it will JP to it if the signed argument is positive.

EDIT: I should also point out - with these 3 multiplication routines, it is possible to construct any kind of x-bit by x-bit multiplication by following the standard long-multiplication approach. For instance, multiplying two 16-bit values will required 4 multiplications and a few sign extensions and additions (this can be reduced to 3 multiplications if you don't mind losing a little bit of accuracy).

Code: Select all

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; alFMulBC:
;;
;;  Fixed-point multiply of B and C, storing the 16-bit result in BC
;;
;; INPUTS:
;;  B   - Multiplicand (-128..127)
;;  C   - Multiplier   (0..255)
;;
;; OUTPUTS:
;;  BC  - B * C
;;
;; DESTROYED:
;;  AF, HL
;;
;; TIMINGS (not including CALL):
;;  144 T-states (B >= 0)
;;  145 T-states (B < 0)
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
        
alFMulBC: 
        ld      a, b                    ; [4] compute A = abs(B).
        add     a, a                    ; [4]
        jp      nc, alUMulBC            ; [132/10] jump if B is positive

        ld      a, b                    ; [4] calculate B + C.
        add     a, c                    ; [4]
        jp      c, $ + $0005            ; [10] jump if A + B is positive
        neg                             ; [8]
        ld      h, g_alUMulTable + 2    ; [4]
        ld      l, a                    ; [4]
        ld      a, c                    ; [4] calculate C - B.
        ld      c, (hl)                 ; [7]
        inc     h                       ; [4]
        sub     b                       ; [4]
        ld      b, (hl)                 ; [7]
        ccf                             ; [4]
        ld      l, a                    ; [4] if carry flag is set we use
        sbc     a, a                    ; [4] g_alUMulTable + 2 otherwise we
        sbc     a, 254 - g_alUMulTable  ; [7] use g_alUMulTable.
        ld      h, a                    ; [4]
        ld      a, c                    ; [4]
        sub     (hl)                    ; [7]
        ld      c, a                    ; [4]
        inc     h                       ; [4]
        ld      a, b                    ; [4]
        sbc     a, (hl)                 ; [7]
        ld      b, a                    ; [4]
        ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of alFMulBC
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Line drawing.

Post by qarnos »

Here is a my line drawing routine.

It makes use of a couple of the undocumented load ixh/ixl instructions, so make sure your assembler supports them (some assemblers might name them XH and XL instead of IXH and IXL).

This routine will only draw black lines (turning pixels on). It does this by executing an unrolled loop of SET instructions, so the plotting is very fast. This code also requires the FractionToFix8 routine I posted earlier (and, optionally, the debugging library).

I have benchmarked this code against several other line drawers and found it is:

11 times faster than TI-OS ILine (no prizes there, though!)
5.5 times faster than MirageOS fastlineb
2 times faster than this one I found on ticalc.org.

Obviously, it lacks the flexibility of the TI-OS and MirageOS routines, although it does allow you to specify which buffer to draw on, but if you want a balls-to-the-wall line drawer and dont mind the size (around 250 bytes), I would think this routine is pretty close to optimal.

Code: Select all

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; alDrawLineB:
;;
;;  Draws a line on the specified screen buffer from (D, E) to (H, L).
;;  All pixels along the line will be turned on.
;;
;; INPUTS:
;;  BC  - Address of screen buffer to draw to.
;;  DE  - x1, y1
;;  HL  - x2, y2
;;
;; DESTROYED:
;;  AF, BC, DE, HL, IX
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
alDrawLineB:

            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; If DEBUG is defined, check to see that the co-ordinates are
            ;; valid and ASSERT if they are not.
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
#ifdef DEBUG
            ld      a, 95
            cp      d
            ASSERT(nc)
            cp      h
            ASSERT(nc)
            ld      a, 63
            cp      e
            ASSERT(nc)
            cp      l
            ASSERT(nc)
#endif
            
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; Compare x1 and x2 and swap if neccessary so that we are always
            ;; drawing left to right.
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ld      a, h                ; [4] first do x2 - x1
            sub     d                   ; [4]
            jr      nc, $ + $0005       ; [12/7] if x1 > x2 then swap the co-ords
            ex      de, hl              ; [4] and negate the accumulator so we
            neg                         ; [8] end up with the absolute line width.
            
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; Calculate absolute height. The real height, which could have a
            ;; negative value, is pushed to the stack, mainly so we can test
            ;; the sign later on.
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ld      h, a                ; [4] replace x2 with line width.
            ld      a, e                ; [4]
            sub     l                   ; [4] y1 - y2
            push    af                  ; [11] save raw line height.
            jr      nc, $ + $0004       ; [7/12]
            neg                         ; [8]
            ld      l, a                ; [4] replace y2 with line height.
            
           
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; Calculate the y1 offset (y1 * 12) and add to SCREENRAM address.
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            push    hl                  ; [11] save width and height.
            ld      a, e                ; [4]
            add     a, a                ; [4] y1 * 2
            add     a, e                ; [4] y1 * 3
            ld      h, 0                ; [7]
            ld      l, a                ; [4]
            add     hl, hl              ; [11] y1 * 6
            add     hl, hl              ; [11] y1 * 12
            add     hl, bc              ; [11] y1 * 12 + SCREENRAM

            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; Now calulcate and add the x1 byte offset.
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ld      a, d                ; [4]
            and     $f8                 ; [7] mask out low 3 bits so we can
            rra                         ; [4] RRA 3 times with no chance of
            rra                         ; [4] setting the carry flag.
            rra                         ; [4]
            ld      b, 0                ; [7] and now set BC to the x pixel
            ld      c, a                ; [4] offset.
            add     hl, bc              ; [11]


            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; Pop the width and height and determine which way we will draw
            ;; the line (ie: using _nhline or _nvline).
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            pop     bc                  ; [10] restore width/height
            ld      a, b                ; [4] cp width - height
            cp      c                   ; [4]
            jr      c, _setupNV         ; [12/7]
            ld      a, c                ; [4]
            ld      ix, _nhline         ; [14]
            jp      _plotLine           ; [10]
_setupNV:   ld      a, b                ; [4]
            ld      b, c                ; [4]
            ld      ix, _nvline         ; [14]

            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; Call FractionToFix8 to obtain the slope of the line.
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
_plotLine:  ld      c, $01              ; [4]
            call    FractionToFix8      ; [290/498]
                        
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; The two plotting subroutines are each made out of 8 segments of
            ;; 9 bytes of code which set the bits of the line. To determine
            ;; the initial entry point we multiply the bit offset by 9 and add
            ;; that to the base address for the routine we will execute.
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ld      a, d                ; [4]
            and     $07                 ; [7]
            ld      e, a                ; [4]
            add     a, a                ; [4] *= 2
            add     a, a                ; [4] *= 4
            add     a, a                ; [4] *= 8
            add     a, e                ; [4] *= 9
            
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; Now add the calculated offset to the address in IX
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            add     a, ixl              ; [8]
            ld      ixl, a              ; [8]
            ld      a, ixh              ; [8]
            adc     a, 0                ; [7]
            ld      ixh, a              ; [8]
            
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; Now check the sign bit we pushed earlier. This will determine
            ;; if the line is being drawn top-to-bottom or vice-versa. From
            ;; here the code will branch to the entry point of either _nvline
            ;; or _nhline.
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            pop     af                  ; [10]]
            ld      a, $80              ; [7]
            inc     b                   ; [4]
            jr      nc, $ + $0007       ; [12/7]
            ld      de, 12              ; [10]
            jp      (ix)                ; [8]
            ld      de, -12             ; [10]
            jp      (ix)                ; [8]

            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; This is an unrolled loop which draws a line where (w <= h)
            ;; (ie: the line is more verticle than horizontal).
            ;;
            ;; INPUTS:
            ;;  HL   - Address of first byte of line in screen RAM.
            ;;  A    - Gradient accumulator.
            ;;  B    - Line counter
            ;;  C    - X gradient
            ;;  DE   - Line skip offset (+/-12)
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            inc     hl              ; [6]    (1)
_nvline:    set     7, (hl)         ; [15]   (2) set pixel bit.
            add     hl, de          ; [11]   (1) advance to next line.
            add     a, c            ; [4]    (1) add gradient.
            jr      c, $ + $000b    ; [7/12] (2) jump to next bit on carry.
            djnz    $ - $0006       ; [13/8] (2) loop if more bits to set.
            ret                     ; [10]   (1)
            set     6, (hl)         ; [15]   (2)
            add     hl, de          ; [11]   (1)
            add     a, c            ; [4]    (1)
            jr      c, $ + $000b    ; [7/12] (2)
            djnz    $ - $0006       ; [13/8] (2)
            ret                     ; [10]   (1)
            set     5, (hl)         ; [15]   (2)
            add     hl, de          ; [11]   (1)
            add     a, c            ; [4]    (1)
            jr      c, $ + $000b    ; [7/12] (2)
            djnz    $ - $0006       ; [13/8] (2)
            ret                     ; [10]   (1)
            set     4, (hl)         ; [15]   (2)
            add     hl, de          ; [11]   (1)
            add     a, c            ; [4]    (1)
            jr      c, $ + $000b    ; [7/12] (2)
            djnz    $ - $0006       ; [13/8] (2)
            ret                     ; [10]   (1)
            set     3, (hl)         ; [15]   (2)
            add     hl, de          ; [11]   (1)
            add     a, c            ; [4]    (1)
            jr      c, $ + $000b    ; [7/12] (2)
            djnz    $ - $0006       ; [13/8] (2)
            ret                     ; [10]   (1)
            set     2, (hl)         ; [15]   (2)
            add     hl, de          ; [11]   (1)
            add     a, c            ; [4]    (1)
            jr      c, $ + $000b    ; [7/12] (2)
            djnz    $ - $0006       ; [13/8] (2)
            ret                     ; [10]   (1)
            set     1, (hl)         ; [15]   (2)
            add     hl, de          ; [11]   (1)
            add     a, c            ; [4]    (1)
            jr      c, $ + $000b    ; [7/12] (2)
            djnz    $ - $0006       ; [13/8] (2)
            ret                     ; [10]   (1)
            set     0, (hl)         ; [15]   (2)
            add     hl, de          ; [11]   (1)
            add     a, c            ; [4]    (1)
            jr      c, $ + $0005    ; [7/12] (2)
            djnz    $ - $0006       ; [13/8] (2)
            ret                     ; [10]   (1)
            djnz    _nvline - 1     ; [13/8] (2)
            ret
            
            
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            ;; This is an unrolled loop which draws a line where (w > h)
            ;; (ie: the line is more horizontal than verticle).
            ;;
            ;; INPUTS:
            ;;  HL   - Address of first byte of line in screen RAM.
            ;;  A    - Gradient accumulator.
            ;;  B    - Column counter
            ;;  C    - Y gradient
            ;;  DE   - Line skip offset (+/-12)
            ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
            inc     hl              ; [6]    (1)
_nhline:    set     7, (hl)         ; [15]   (2) set pixel bit.
            add     a, c            ; [4]    (1) add delta.
            jr      nc, $ + $0003   ; [12/7] (2) jump on gradient overflow.
            add     hl, de          ; [11]   (1) advance to next line.
            djnz    $ + $0003       ; [13/8] (2) loop.
            ret                     ; [10]   (1)
            set     6, (hl)         ; [15]   (2)
            add     a, c            ; [4]    (1)
            jr      nc, $ + $0003   ; [12/7] (2)
            add     hl, de          ; [11]   (1)
            djnz    $ + $0003       ; [13/8] (2)
            ret                     ; [10]   (1)
            set     5, (hl)         ; [15]   (2)
            add     a, c            ; [4]    (1)
            jr      nc, $ + $0003   ; [12/7] (2)
            add     hl, de          ; [11]   (1)
            djnz    $ + $0003       ; [13/8] (2)
            ret                     ; [10]   (1)
            set     4, (hl)         ; [15]   (2)
            add     a, c            ; [4]    (1)
            jr      nc, $ + $0003   ; [12/7] (2)
            add     hl, de          ; [11]   (1)
            djnz    $ + $0003       ; [13/8] (2)
            ret                     ; [10]   (1)
            set     3, (hl)         ; [15]   (2)
            add     a, c            ; [4]    (1)
            jr      nc, $ + $0003   ; [12/7] (2)
            add     hl, de          ; [11]   (1)
            djnz    $ + $0003       ; [13/8] (2)
            ret                     ; [10]   (1)
            set     2, (hl)         ; [15]   (2)
            add     a, c            ; [4]    (1)
            jr      nc, $ + $0003   ; [12/7] (2)
            add     hl, de          ; [11]   (1)
            djnz    $ + $0003       ; [13/8] (2)
            ret                     ; [10]   (1)
            set     1, (hl)         ; [15]   (2)
            add     a, c            ; [4]    (1)
            jr      nc, $ + $0003   ; [12/7] (2)
            add     hl, de          ; [11]   (1)
            djnz    $ + $0003       ; [13/8] (2)
            ret                     ; [10]   (1)
            set     0, (hl)         ; [15]   (2)
            add     a, c            ; [4]    (1)
            jr      nc, $ + $0003   ; [12/7] (2)
            add     hl, de          ; [11]   (1)
            djnz    _nhline - 1     ; [13/8] (2)
            ret                     ; [10]   (1)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of alDrawLineB
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Re: Line drawing.

Post by CoBB »

qarnos wrote:I have benchmarked this code against several other line drawers and found it is:

11 times faster than TI-OS ILine (no prizes there, though!)
5.5 times faster than MirageOS fastlineb
2 times faster than this one I found on ticalc.org.
You should definitely compare its speed with Jim e's:

https://www.ticalc.org/archives/files/f ... 34642.html

And here's mine, which has a decent speed (about 1.5 times faster than Badja's) and an even more decent size:

http://www.ticalc.org/archives/files/fi ... 23429.html
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Re: Line drawing.

Post by qarnos »

CoBB wrote:
qarnos wrote:I have benchmarked this code against several other line drawers and found it is:

11 times faster than TI-OS ILine (no prizes there, though!)
5.5 times faster than MirageOS fastlineb
2 times faster than this one I found on ticalc.org.
You should definitely compare its speed with Jim e's:

https://www.ticalc.org/archives/files/f ... 34642.html

And here's mine, which has a decent speed (about 1.5 times faster than Badja's) and an even more decent size:

http://www.ticalc.org/archives/files/fi ... 23429.html
Thanks for this. I had heard about Jim E's routine but had never been able to find it!

I will check it out.

UPDATE:

I benchmarked the interrupt-safe version of Jim E's routine and found, in my tests at least, that my code was 1.06 times faster - a negligable difference.

I was unable to test the interrupts disabled routine due to the fact that my benchmarking code needs the interrupts for timing. For the same reason, I was unable to test your code.

EDIT (AGAIN)

I thought I should publish my benchmarking method, just in case it is flawed or if someone wants try to reproduce my results.

Basically, I read the entire 64kb of the Z80-visible RAM and interpret each 4 byte group as a set of line end-points:

Code: Select all

_lineLoop:  ld      a, (bc)
            inc     bc
            and     95
            ld      d, a
            ld      a, (bc)
            inc     bc
            and     63
            ld      e, a
            ld      a, (bc)
            inc     bc
            and     95
            ld      h, a
            ld      a, (bc)
            inc     bc
            and     63
            ld      l, a
            push    bc
            ld      bc, PLOTSSCREEN
            call    alDrawLineB
            pop     bc
            ld      a, b
            or      c
            jr      nz, _lineLoop
I have an interrupt handler running which keeps track of how many interrupts occurred during the line drawing code.

It yeilds consistant results, so I am satisfied.
Last edited by qarnos on Sun 27 Aug, 2006 11:59 am, edited 2 times in total.
User avatar
Jim e
Calc King
Posts: 2457
Joined: Sun 26 Dec, 2004 5:27 am
Location: SXIOPO = Infinite lives for both players
Contact:

Post by Jim e »

I'd say it be faster than mine on long lines. Mine uses Bresenham line algorithm so its has to do some compares. Other than that the code looks strikingly similar. Actually mine started off using slope but I thought the over head from the div routine would take to long.

Edit: Couldn't that Fraction to fix 8 routine be optimised like this.

Code: Select all

FractionToFix8:                             .MODULE FractionToFix8
            add     a, a                    ; [4]
            jr      c, _overFlow            ; [12/7]
            cp      b                       ; [4]
            jr      nc, _overFlow           ; [12/7]
            sla     c                       ; [8]
            jp      nc, FractionToFix8      ; [10]
            ret                             ; [10]

_overFlow:  sub     b                       ; [4]
            sll     c                       ; [8]
            jp      nc, FractionToFix8      ; [10]
            ret                             ; [10]
Image
Post Reply