Using Greyscale for APPs
Page 6 of 6

Author:  tr1p1ea [ Mon 31 Jul, 2006 5:26 am ]
Post subject: 

About a year ago i got the routine down to 70, i didnt make use of sp, just the increasing the high-byte of the buffer pointer to get the next byte trick.

Then doynax posted his optimised grayscale routine (in april of this year) and geniously pointed out that:

(A ^ B) & C ^ B = (A & C) | (B & ~C)

Very suprised no-one bothered to look at this before! :).

His routine used a custom buffer setup as well as being layer interlaced. You can find the thread here: ... php?t=1668

My routine still uses the conventional buffer setup and i had another look and got it down to 59. Im sure it can be optimised further again.

Author:  Jim e [ Mon 31 Jul, 2006 8:09 am ]
Post subject: 

Figured you would come in with something about the masks, I never bothered looking for another way to apply masks. If its correct then its definately gonna speed things. I can imagine a trick to get it down to 51 clocks i believe. Atleast 10,000 clocks could be killed. How ever I'd like to point out its pointless to go below 64, after that your just ruining compatibilty for the busted lcds.

I almost want to update RGP now.

Author:  tr1p1ea [ Mon 31 Jul, 2006 8:50 am ]
Post subject: 

Still, its nice to know that there is room to play with ... plus you could make it app compatible easier.

It would be cool if you could outline your 51 cycle idea as well :).

Author:  Jim e [ Mon 31 Jul, 2006 10:52 am ]
Post subject: 

Well purely streching for the basis of speed, I'd go sigma's way The best damn grey routine ever method. That being unrolling the whole thing into a one of the safe ram buffers. I estimate it running at ~40000 and being about 515 bytes.

   out ($10),a
;repeat 64 times
   ld a,(de)      ; 51 tstates
   xor (hl)
   and c
   xor (hl)
   out ($11),a
   add hl,sp
   ld e,l
;2 or 3  inc d need to be inserted in there somewhere.
   dec h
   dec h
   dec h
   dec d
   dec d
   dec d   ;only needed if 3 was used before.
   inc b
   ld a,b
   cp $2c
   jp z,loop

The basis of the code is that Buffer1 is aligned with Buffer2, as long as thats true it will work, which current implementations of Gpp and RGP this is true. This actually could be practical to use if weren't to fast. Slight alteration will let c and b be used as masks so the noise could be reduced. Just loop the fucking thing and we get 64 tstates which is completely resaonable and fast enough.

I thinks at the point were its over kill. Its quicker than fastcopy, buts its not worth it if it kills compatibilty. It could be faster still if the buffer logic changes but no one wants that.

Page 6 of 6 All times are UTC
Powered by phpBB® Forum Software © phpBB Group