Contact me | Login | Search | Sitemap | Site Notice

Developing a rotozoom-effect in LORES

This little demo shows a rotozoomer in LORES written in 6502-assembly language. The frame rate is about 13,7 fps. There are three DSK-images available:

  • New: A DSK-image with a Double LORES-version of the rotozoomer (requires Apple IIe enhanced with 128kB RAM, runs at approximately 4 fps).
  • New: DSK-image with a special X-Mas-Demo and MockingBoard sound available (requires min. a 48kB Apple ][+ or for sound an 128 kB //e or better with a MockingBoard in any slot, IIgs is supported but needs the Alternate Display Mode set to On in the CDA-menu first before booting the demo!)
  • DSK-image with a speed optimised version running on 6502-CPU machines from stock Apple ][ with 48k RAM or better.
  • DSK-image for 6502-CPU machines like a stock Apple ][ or ][+ with 48k/64k of RAM
  • DSK-image for 65C02-CPU machines like the Apple //e enhanced or better

The 65C02-version is slightly faster and does also support the evaluation of the vertical blanking signal in order to reduce animation jitter.

Please note: the demo may not run properly with accelerator cards installed!

Here is a short video of the rotozoom:

Technical background

A rotozoomer is an already well-known old-school demo effect. However, I was interested to implement a version for the Apple ][ LORES-mode in order to see how fast (regarding to fps) and compact I can get the code. 

A good description about the underlying maths of a rotozoomer can be found here

In my approach of the rotozoomer I do the needed trigonometric calculus (linear transformation) for each frame only for three coordinates ((0,0), (1,0), (0,1)) of the texture and calculate the corresponding deltas before I draw the frame. The deltas in x- and y-direction are then only summed up when iterating through the complete LORES-screen.

Preparing a new frame needs 24 multiplications (3 * 8) for the three rotated points in order to determine the required deltas. This results in 7911 cycles/frame (around 8 ms) for the whole trigonometric calculus!

The results of a profiler run (champ-profiler) is shown here:

The profiler output shows the number of cycles for each subroutine and how often a subroutine gets called. The profiler run shows the results for 49 recorded frames (not 50 as is displayed in the upper left corner of the output screen), hence the following routines have been called:

  • initFRAME: does the trigonometric calculus containing a linear transformation where the three pixels are rotated and scaled in order to calculate the step size for the deltas in x- and y-direction. For each point the subroutine calcUV is called (three times per frame).
  • drwROTO: fills the LORES-RAM with the appropriate pixel information
  • dblBUF: manages the switching between LORES screen 1 and 2 in order to reduce flickering, e.g. draw on screen 1 while displaying screen 2. In the 65C02-version of this demo the double buffer control waits for the vertical blanking signal before switching screen memory which even more reduces unwanted flickering.

Numerical accuracy

Most of the calculus is done using 8.8 fixed point arithmetics which yields sufficient accuracy for the effect. However, when performing the zooming effects by using a scaling factor a division is necessary. For this operation the accuracy has been increased to 16.16 fixed point yielding the best results for a very close and very distant zooming range.

Actually I used two different fixed point resolution for the x- and y-coordinate (16.16 and 8.24) in order to speed up the pixel drawing (see below)

Drawing the LORES screen

The following core code is used to fill the 16 x 16 pixel texture into the LORES-RAM:

	LDY	#0		; init x-coordinate counter
LOOPXR	LDA	DXU		; calculating Y-coordinate in Texture
	AND	#%11110000	; select the higher nibble containing 
	STA	P1		; the Y-coordinate!
	LDA	DXV		; calculating X-ccordinate in Texture
	AND	#%00001111	; performs modulo 16 operation - strip 
	ORA	P1		; away upper bits & add Y-offset
	TAX			; transfer index in X-reg
	LDA	TEXTURE,X	; load texture pixel
	STA	(BASELINE),Y	; plot pixel
	CPY	#40		; XPOS = 40 ? end of LORES line

Some remarks on the code:

  • The Y-register holds the counter for the x-position (0..39) of the current LORES line being plotted.
  • DXU and DXV hold the increments in u- and v-direction (u and v are in the texture coordinate system!) that have to be added onto the current texture coordinates when moving along the x-axis of the LORES screen. These deltas depend on the current rotation angle and scaling of the texture
  • UKO and VKO are the current positions in the texture coordinate system that are plotted to the current screen coordinates X and Y on the LORES screen. Both variables are used in 8.8 fixed point format. When stepping through the X- and Y-coordinates of the screen UKO and VKO are increased by DXU and DXV accordingly. The two 16 bit additions can be spotted easily in the code.
  • The extraction of the current texture pixel to plot from UKO and VKO is a bit tricky. The texture is a 16 pixel by 16 pixel array in memory (256 bytes). So UKO contains the information about the y-coordinate of the texture and VKO the information about the x-offset. In order to have a fast access of the texture information we need to create an index in order to directly load the required value via an indexed LDA into the accumulator.  

    • The variable UKO has been scaled in the preparing steps in such a way that the HI-nibble represents the y-coordinate of the texture. The value of the HI-nibble is stripped by the AND #%11110000 opcode and stored into the temporary variable P1.
    • The x-coordinate in the texture is stripped from the LO-nibble of VKO by AND #%00001111
    • Both nibbles are combined by ORAing (ORA P1) hence resulting in the correct index into the texture array! Maybe there are faster implementations than this but I thought this should be a  pretty fast solution.

  • Plotting the pixel is done by the STA (BASELINE),Y opcode. The zero page variable BASELINE is set in the outer Y-coordinate loop and points to the base address of the current line on screen 1 and screen 2 accordingly. This is done by a small lookup table which is a common method especially when dealing with HIRES graphics.
  • Remember that the Y-register holds the current pixel position in the line. Looping over a register is faster compared to using e.g. a zero page one byte counter.


As written above the texture used for this demo is a 16-by-16-pixel texture resulting in a 256 bytes long array that is indexed as described above.

The texture consists out of single bytes that represent the LORES-colors $00 to $FF. I chose to set both nibbles of the texture to the same color and give up the possibility to have split blocks with different coloured nibbles. Hence the LORES resolution of this demo is 40 x 24 pixel (instead of the 40 x 48 pixel). I liked the idea better to have quadratic pixels.

Here is the code for the Apple Logo texture:

TEXTURE	HEX	CC110000000000000000CC0000000033
	HEX	000000000000000000CCCC0000000000
	HEX	0000000000000000CCCC000000000000
	HEX	00000000CCCCCC00CC0000CCCCCC0000
	HEX	00999999999999999999999999000000
	HEX	00999999999999999999999999000000
	HEX	00111111111111111111111111000000
	HEX	00001111111111111111111111110000
	HEX	00000033333333333333333333333300
	HEX	00000000333333333333333333330000
	HEX	00000000006666666666666666000000
	HEX	00000000000066660000666600000000
	HEX	660000000000000000000000000099DD

One could think about larger textures but this would require more maths when indexing in to the texture when the 256-byte limit of a single byte index variable is crossed and hence slowing down the algorithm.