Ill most likely release the source, or at least the source to the necessary routines.
There are a lot of shortcuts i guess but as far as the matrix code goes each element is 6-bits and uses specialized math routines/tables for speed.
I also do other things like multiply by reciprocal instead of a division for perspective etc.
Lately ive been working on 3D movement, tracking and collision routines etc (screenies in the other thread:
viewtopic.php?p=67219#p67219)