Well purely streching for the basis of speed, I'd go sigma's way The best damn grey routine ever method. That being unrolling the whole thing into a one of the safe ram buffers. I estimate it running at ~40000 and being about 515 bytes.
Code: Select all
loop:
out ($10),a
;---------------
;repeat 64 times
ld a,(de) ; 51 tstates
xor (hl)
and c
xor (hl)
out ($11),a
add hl,sp
ld e,l
;2 or 3 inc d need to be inserted in there somewhere.
;---------------
dec h
dec h
dec h
dec d
dec d
dec d ;only needed if 3 was used before.
inc b
ld a,b
cp $2c
jp z,loop
The basis of the code is that Buffer1 is aligned with Buffer2, as long as thats true it will work, which current implementations of Gpp and RGP this is true. This actually could be practical to use if weren't to fast. Slight alteration will let c and b be used as masks so the noise could be reduced. Just loop the fucking thing and we get 64 tstates which is completely resaonable and fast enough.
I thinks at the point were its over kill. Its quicker than fastcopy, buts its not worth it if it kills compatibilty. It could be faster still if the buffer logic changes but no one wants that.