There are 2 major issues with grayscale. The first is flicker, the other is timing. Both of these dictate the quality of your image and also the amount of CPU time you have left over for the remainder of your program.
Most people know the basic idea behind 4lvl grayscale, that is constantly displaying the contents of your dark buffer for a longer amount of time than your light buffer. Black is on 100%, dark gray for 66%, light gray for 33% and white for 0%. Most people have tried the basic way of just switching buffers and the result is often a flickery mess
.
There have been several successful cases over the years where interlacing techniques have greatly reduced the amount of flicker in grayscale. It started off on a byte level, but has been taken to the bit level as well, which is what you see in the most modern grayscale routines like GPP/RGP.
The basic idea is to use a series of masks to interlace 2 bytes from your buffers into one byte ready for output. The masks you use are designed to do exactly what the principal behind 4lvl grayscale is; display the dark buffer for longer than the light buffer. This in mind, your mask sequence could be:
110110110110110110110110
or broken down to:
11011011 01101101 10110110
And to prepare a byte you would do:
LCD = (darklayer AND mask) OR (light AND inverted mask)
AND mask with byte from Dark Layer
invert mask
AND mask with byte from Light Layer
OR resultant bytes together
output to LCD
The reason you invert the mask is because 11011011 will give you 75% of the dark layer, so if you invert (complement) it to 00100100 it will give you 25% of the light layer, which is what you want (you will notice that this is different than the 66%/33% mentioned earlier, but its still gray
).
Then after each frame you circularly rotate through the 3 masks. This is done because if you use the same mask for 2 (or more) successive frames you will be showing certain bits from your gray buffers for too long, thus you will throw out the timing/ruin the formula and the result will not be very grayscaley!
.
Timing is another important aspect of grayscale for reducing both flicker and improving speed. To achive the best most consistent image, you must synch your grayscale routine to the refresh rate of the calcs LCD. Jim e measured this to be around ~62Hz which means for best results you should aim to have your routine update the LCD 62 times every second (62fps might be easier to understand, if you think of the 'f' as meaning 'grayscale frames' and not 'game/program frames'). The goal for everyone has been to perform the byte preperation (the masking and such) in less than the amount of time required between successive writes to the calcs LCD. In clock cycles this has been measured to be around 64-65ccs (which is a bit over 10microseconds iirc). Achieving this means that your grayscale routine wont add any overhead over what is required to output a frame to the LCD (any optimising passed this stage would just be for kicks
).
A typical fastcopy is around 50,000ccs (probably a little less). This is as fast as you can possible get it since any faster will void the required delays and make the LCD controller unhappy and thus bork your display. This means that to output at the optimal ~62Hz, you are using up 50000 * 62 = 3100000cc's or 51.66% of your available CPU time just for the grayscale routine alone.
It is all well and good to poll until the LCD is ready, and this might be a preferrable technique if you dont intend on using your grayscale routine for a game (would be fine for titlescreens etc), since technically you can synch to 62Hz with a routine that takes over 96,000ccs, but this would almost eat 100% of your CPU! This is the sole reason why people have been crazy about optimising grayscale routines since they want as much processing power as they can get available for their actual project. Of course, things like ccs are somewhat relative to a few factors like battery life and such. Also not all calcs use the same LCD controllers and even if they do, lots are configured differently (hence the need for ALCDFix (possibly one of the most useful ti programs ever made)). This is why ccs arent always reliable for timing, and why its not always a good idea to use them. That said, the bonus is that measuring by cc's theoretically cant cause problems with a correctly configured LCD (after using ALCDFix if need be) since slower instructions as a result of lower battery power will only increase your delays between writes for a bit and thus keep the LCD controller happy anyway
.
Err anyway, the best way to time your routine is of course to use an interrupt that fires at around 62Hz. There are many timer combinations you could come up with to achieve this.
For historical purposes, there was a rather long struggle to achieve 4lvl grayscale in under that magical LCD delay time of around 64-65ccs between writes. GPP had it at 77ccs and RGP had it at 73ccs. GPP had different timing that didnt properly synch to the LCD (~62Hz) so it could randomly use anywhere from 60%-80% or more of your CPU time. RGP used better timing to get it down to around 59% CPU time i guess. I used a cheap 'inc h' trick to get it down to 70ccs but thats still like ~55% CPU time. It kinda remained at that for a year or so until someone named doynax posted this magnificient realisation:
Code: Select all
(A ^ B) & C ^ B = (A & C) | (B & ~C)
He had a custom buffer implementation that ran at an average of 63.3ccs per write which was extrodinary. The non-standard buffer kind of overcomplicated things and borked the overall routine but he had made the magical realisation of refining the logic behind it when such a move had never occured to anyone else. Making use of this new information it was a simple mod to get my old one down to 59ccs, which was too fast anyway, so i just padded it with an inc hl to round it out to a nice safe 65ccs (there is a 64cc version but 65ccs is safer) ... something like this:
Code: Select all
inc hl
add hl,sp
ld a,(hl)
inc h
inc h
inc h ; + 768 = byte in other buffer
xor (hl)
and d ; mask
xor (hl)
out ($11),a ; 65ccs
Jim was made aware of the logic change and also made use of the 'inc h' trick in his latest version of RGP ... though im not sure if its publically available. Although it should theoritically be ok to have it run at 64ccs, expeirments have shown that its too close to the limit and often throws calcs out of whack so 65ccs seems a safe bet ... either way its close enough (should be about 52% CPU time)
.
For fun there was a small discussion with regards to how low you could get the interlacing down too (37 i think i managed) ... could be a fun exercise
.
That should be a basic outline of bit-interlaced 4lvl grayscale and a little bit of history. I hope i didnt ruin any fun by posting some code
. I also hope this stuff makes sense and that i havent made any fatal errors (passed midnight here)
.
Heres a full 4lvl routine for reference (i dug it up but it should work ok), could be improved upon marginally but the result wouldnt be overly noticeable enough for me to be bothered
. Yeah it uses some smc, even in parts where it doesnt have to, can still be made smaller (16 loops of 4 writes) or even APP compatible easily enough i guess:
Code: Select all
;--------------------------------------------------------------
; gsRoutine - tr1p1ea
;--------------------------------------------------------------
; 4lvl bit-interlaved grayscale routine
;
gsRoutine:
ld (_smc_sp + 1),sp
ld a,$80
out ($10),a
ld a,(_smc_gr_3 + 1)
ld b,a
ld a,(_smc_gr_2 + 1)
ld (_smc_gr_3 + 1),a
ld a,(_smc_gr_1 + 1)
ld (_smc_gr_2 + 1),a
ld a,b
ld (_smc_gr_1 + 1),a
_smc_gr_1:
ld d,%11011011
_smc_gr_2:
ld e,%01101101
_smc_gr_3:
ld c,%10110110
ld hl,gActiveBuf1+756
ld a,$20
gsCopy_1:
ld (_smc_gr_4 + 1),a
ld sp,-757
ld b,7
out ($10),a
gsCopy_2:
inc hl
add hl,sp
ld a,(hl)
inc h
inc h
inc h
xor (hl)
and d
xor (hl)
out ($11),a
inc hl
add hl,sp
ld a,(hl)
inc h
inc h
inc h
xor (hl)
and e
xor (hl)
out ($11),a
inc hl
add hl,sp
ld a,(hl)
inc h
inc h
inc h
xor (hl)
and c
xor (hl)
out ($11),a
inc hl
add hl,sp
ld a,(hl)
inc h
inc h
inc h
xor (hl)
and d
xor (hl)
out ($11),a
inc hl
add hl,sp
ld a,(hl)
inc h
inc h
inc h
xor (hl)
and e
xor (hl)
out ($11),a
inc hl
add hl,sp
ld a,(hl)
inc h
inc h
inc h
xor (hl)
and c
xor (hl)
out ($11),a
inc hl
add hl,sp
ld a,(hl)
inc h
inc h
inc h
xor (hl)
and d
xor (hl)
out ($11),a
inc hl
add hl,sp
ld a,(hl)
inc h
inc h
inc h
xor (hl)
and e
xor (hl)
out ($11),a
inc hl
add hl,sp
ld a,(hl)
inc h
inc h
inc h
xor (hl)
and c
xor (hl)
out ($11),a
djnz gsCopy_2
inc hl
add hl,sp
ld a,(hl)
inc h
inc h
inc h
xor (hl)
and e
xor (hl)
out ($11),a
ld sp,-767
add hl,sp
_smc_gr_4:
ld a,$00
inc a
cp $2C
jp nz,gsCopy_1
_smc_sp:
ld sp,$0000
ret