About a year ago i got the routine down to 70, i didnt make use of sp, just the increasing the high-byte of the buffer pointer to get the next byte trick.
Then doynax posted his optimised grayscale routine (in april of this year) and geniously pointed out that:
(A ^ B) & C ^ B = (A & C) | (B & ~C)
Very suprised no-one bothered to look at this before! .
His routine used a custom buffer setup as well as being layer interlaced. You can find the thread here: http://kvince83.tengun.net/maxboard/vie ... php?t=1668
My routine still uses the conventional buffer setup and i had another look and got it down to 59. Im sure it can be optimised further again.
Using Greyscale for APPs
Moderator: Duck
- Jim e
- Calc King
- Posts: 2457
- Joined: Sun 26 Dec, 2004 5:27 am
- Location: SXIOPO = Infinite lives for both players
- Contact:
Figured you would come in with something about the masks, I never bothered looking for another way to apply masks. If its correct then its definately gonna speed things. I can imagine a trick to get it down to 51 clocks i believe. Atleast 10,000 clocks could be killed. How ever I'd like to point out its pointless to go below 64, after that your just ruining compatibilty for the busted lcds.
I almost want to update RGP now.
I almost want to update RGP now.
- Jim e
- Calc King
- Posts: 2457
- Joined: Sun 26 Dec, 2004 5:27 am
- Location: SXIOPO = Infinite lives for both players
- Contact:
Well purely streching for the basis of speed, I'd go sigma's way The best damn grey routine ever method. That being unrolling the whole thing into a one of the safe ram buffers. I estimate it running at ~40000 and being about 515 bytes.
The basis of the code is that Buffer1 is aligned with Buffer2, as long as thats true it will work, which current implementations of Gpp and RGP this is true. This actually could be practical to use if weren't to fast. Slight alteration will let c and b be used as masks so the noise could be reduced. Just loop the fucking thing and we get 64 tstates which is completely resaonable and fast enough.
I thinks at the point were its over kill. Its quicker than fastcopy, buts its not worth it if it kills compatibilty. It could be faster still if the buffer logic changes but no one wants that.
Code: Select all
loop:
out ($10),a
;---------------
;repeat 64 times
ld a,(de) ; 51 tstates
xor (hl)
and c
xor (hl)
out ($11),a
add hl,sp
ld e,l
;2 or 3 inc d need to be inserted in there somewhere.
;---------------
dec h
dec h
dec h
dec d
dec d
dec d ;only needed if 3 was used before.
inc b
ld a,b
cp $2c
jp z,loop
I thinks at the point were its over kill. Its quicker than fastcopy, buts its not worth it if it kills compatibilty. It could be faster still if the buffer logic changes but no one wants that.