ORGFX - a Wishbone compatible Graphics Accelerator for the

Transcription

ORGFX - a Wishbone compatible Graphics Accelerator for the
OpenRISC processor
Per Lenander
Mälardalen University
Robotics program
Västerås, Sweden
per.lenander.swe@gmail.com
Anton Fosselius
Mälardalen University
Robotics program
Västerås, Sweden
anton.fosselius@gmail.com
August 20, 2012
Abstract
Modern embedded systems such as cellphones or medical instrumentation use increasingly complex graphical interfaces. Currently there are no widely used open hardware solutions to accelerate embedded graphical
applications. This thesis presents the ORSoC graphics accelerator (ORGFX), an open hardware graphics accelerator that can be used with programmable hardware. A standalone software implementation is provided
to help for a quick development of accelerated applications.
The accelerator is able to render 2D, 3D and vector graphics. The example implementation of the
ORGFX is integrated with the OpenRISC Reference Platform System on Chip version 2 (ORPSoCv2). The
final implementation runs on a Xilinx FPGA at 50 MHz, and provides accelerated graphics output from
an HDMI port. An extensive software driver and a set of utilities to ease development for the graphics
accelerator are provided along with the hardware. The software implementation of the accelerator uses the
same API as the hardware drivers, making it possible to quickly develop applications for the accelerator
without access to a physical platform.
The final implementation trades performance against platform independence and generality. The component can be integrated with any CPU or memory chip and works alongside a custom display core that
renders the output to an external screen. The software drivers can be run bare metal or modified to run on
an operating system.
All of the hardware and software developed in this project is provided as open source under the GNU
Lesser General Public License (LGPL), and can be downloaded from www.opencores.com. The authors
hope that future releases will be integrated as a standard component into the OpenRISC Reference Platform
System on Chip.
Keywords: Embedded Computer Graphics, OpenRISC, FPGA, Vector Graphics
Contents
1 Introduction
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
4
4
2 Related works
4
3 Concepts
3.1 Introduction to graphics . . . . . . . .
3.1.1 Rasterized graphics . . . . . . .
3.1.2 Vector graphics . . . . . . . . .
3.1.3 Framebuffer . . . . . . . . . . .
3.1.4 Textures . . . . . . . . . . . . .
3.1.5 Sprites . . . . . . . . . . . . . .
3.1.6 Fonts . . . . . . . . . . . . . .
3.1.7 Glyph . . . . . . . . . . . . . .
3.1.8 Triangulation . . . . . . . . . .
3.2 Hardware terminology . . . . . . . . .
3.2.1 FPGA technology . . . . . . .
3.2.2 Hardware description languages
3.2.3 IP Cores . . . . . . . . . . . . .
3.2.4 System-on-Chip . . . . . . . . .
3.2.5 Hard and soft CPUs . . . . . .
3.2.6 OpenRISC . . . . . . . . . . .
3.2.7 Wishbone bus . . . . . . . . . .
3.2.8 ORPSoCv2 . . . . . . . . . . .
3.3 Vector Fonts . . . . . . . . . . . . . .
3.3.1 TrueType fonts . . . . . . . . .
3.3.2 PostScript fonts . . . . . . . . .
3.3.3 OpenType fonts . . . . . . . .
3.3.4 FreeType . . . . . . . . . . . .
3.4 Linux and free Software . . . . . . . .
3.4.1 GPL . . . . . . . . . . . . . . .
3.4.2 LGPL . . . . . . . . . . . . . .
3.4.3 Linux . . . . . . . . . . . . . .
3.4.4 Drivers . . . . . . . . . . . . .
3.4.5 DirectFB . . . . . . . . . . . .
3.4.6 X-Server . . . . . . . . . . . . .
3.4.7 KMS . . . . . . . . . . . . . . .
3.4.8 Direct Rendering Infrastructure
3.4.9 Direct Rendering Manager . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Requirements
5
5
5
5
6
6
7
7
7
7
7
7
7
7
7
8
8
8
8
8
8
9
9
9
9
9
9
9
9
9
11
11
11
11
11
5 Design
5.1 Display control . . . . . . . . . . . . . . . . . . . .
5.1.1 Render target . . . . . . . . . . . . . . . . .
5.1.2 Device coordinate system . . . . . . . . . .
5.1.3 Texture coordinate system . . . . . . . . . .
5.2 Control interface . . . . . . . . . . . . . . . . . . .
5.3 2D engine features . . . . . . . . . . . . . . . . . .
5.3.1 Color depth modes and variable resolution .
5.3.2 Rectangles . . . . . . . . . . . . . . . . . .
5.3.3 Lines . . . . . . . . . . . . . . . . . . . . . .
5.3.4 Triangles . . . . . . . . . . . . . . . . . . .
5.3.5 Clipping . . . . . . . . . . . . . . . . . . . .
5.3.6 Coloring . . . . . . . . . . . . . . . . . . . .
5.3.7 Color keying . . . . . . . . . . . . . . . . .
5.3.8 Alpha blending . . . . . . . . . . . . . . . .
5.4 3D engine features . . . . . . . . . . . . . . . . . .
5.4.1 Transformations . . . . . . . . . . . . . . .
5.4.2 Interpolation . . . . . . . . . . . . . . . . .
5.4.3 Z-buffer culling . . . . . . . . . . . . . . . .
5.5 Vector engine features . . . . . . . . . . . . . . . .
5.5.1 Path theory . . . . . . . . . . . . . . . . . .
5.5.2 Shape implementation . . . . . . . . . . . .
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
12
12
12
13
13
13
13
14
15
17
17
18
18
19
19
20
20
20
20
21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
22
23
23
23
24
24
6 HDL implementation
6.1 Development board . . . . . . . . . . . . . . .
6.1.1 Video Ram . . . . . . . . . . . . . . .
6.1.2 Display core . . . . . . . . . . . . . . .
6.1.3 HDMI converter . . . . . . . . . . . .
6.2 Architecture . . . . . . . . . . . . . . . . . . .
6.2.1 OpenRISC CPU . . . . . . . . . . . .
6.2.2 System-on-Chip . . . . . . . . . . . . .
6.2.3 Wishbone interfaces . . . . . . . . . .
6.2.4 Pipeline . . . . . . . . . . . . . . . . .
6.2.5 Transformation processor . . . . . . .
6.2.6 Rasterizer . . . . . . . . . . . . . . . .
6.2.7 Interpolation . . . . . . . . . . . . . .
6.2.8 Clipping . . . . . . . . . . . . . . . . .
6.2.9 Fragment processor: coloring . . . . .
6.2.10 Fragment processor: vector rendering
6.2.11 Blender . . . . . . . . . . . . . . . . .
6.2.12 Renderer . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
25
25
25
25
25
26
26
28
28
28
28
29
29
30
30
31
31
7 Software integration
7.1 Bare metal driver . . . . . . . . . . . . . . . . . . .
7.1.1 Basic functionality . . . . . . . . . . . . . .
7.1.2 Extended API . . . . . . . . . . . . . . . .
7.1.3 Advanced API – Tilesets and bitmap fonts
7.1.4 Advanced API – Vector fonts . . . . . . . .
7.1.5 Advanced API – 3D . . . . . . . . . . . . .
7.2 Utilities . . . . . . . . . . . . . . . . . . . . . . . .
7.2.1 Sprite maker utility . . . . . . . . . . . . .
7.2.2 Bitmap font maker utility . . . . . . . . . .
7.2.3 Mesh maker utility . . . . . . . . . . . . . .
7.2.4 Vector font maker utility . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
32
32
33
34
34
35
35
35
35
38
38
8 Testing and validation
8.1 Algorithmic validation
8.2 Hardware validation .
8.3 Software validation . .
8.4 System validation . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
38
38
39
39
39
5.6
5.5.3 Alternative approaches
Software . . . . . . . . . . . .
5.6.1 Bus interface . . . . .
5.6.2 Surfaces . . . . . . . .
5.6.3 Meshes . . . . . . . .
5.6.4 Fonts . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9 Results
39
9.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
9.2 Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
10 Future work
10.1 Textures . . . . . . . . . . . . .
10.2 Bandwidth issues . . . . . . . .
10.3 8/24/32 bpp . . . . . . . . . . .
10.4 Alpha from memory . . . . . .
10.5 Precision issues . . . . . . . . .
10.6 Platform specific optimizations
10.7 Other bus implementations . .
10.8 Linux driver . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
40
40
40
40
40
40
41
41
41
11 Conclusions
41
A Appendix A, ORGFX Specification
43
B Appendix B, Enhanced VGA/LCD Specification
75
3
1
Introduction
There is a growing demand for graphic user interfaces in modern embedded systems. The end-user wants
an easy to use graphical interface in everything from cellphones to machine interfaces. A big part of the
embedded market uses open source software and some have begun to work with open hardware. While there
are several processors for embedded systems available today with accelerated graphics, few offer open source
graphics drivers. To make it possible to build a more open system this thesis shows how a pipelined fixed
point graphics accelerator can be implemented. While building a graphics accelerator from scratch is a non
trivial task that requires some time and knowledge, it is far from impossible. With the introduction of the
Field Programmable Gate Array (FPGA)1 , it is suddenly possible to create hardware by writing a few lines
of code in a Hardware Description Language (HDL)2 . As the FPGA chips are getting cheaper and faster they
are more commonly used in embedded systems. The logical next step for a embedded system with a FPGA
is to integrate not only open source software but also open hardware into the design. This thesis presents the
ORSoC Graphics Accelerator – ORGFX – an open hardware component with open source drivers capable
of rendering 2D, vector and simple 3D graphics. See appendix A for a technical specification of the final
result of this thesis.
1.1
Background
Open source software is steadily gaining ground as more and more companies start to see the benefits that
it brings. Something that is still relatively unknown is open hardware.
If a large set of Intellectual Property cores (IP) that provide common functionality (such as memory
interfaces, Ethernet connectors, USB connectors and so on) were open source, it would greatly increase
the speed at which a System-on-Chip (SoC) can be developed. The Swedish company ORSoC attempts
to increase the amount of open hardware available on the market by maintaining and developing the open
source community OpenCores.org and the open source processor OpenRISC. Recently ORSoC has released
a development board with an Altera Cyclone IV FPGA and some standard connectors. The board is intended
to demonstrate the capabilities of the OpenRISC processor.
The OpenRISC processor can run the Linux operating system, and even supports some basic frame-buffer
rendering. However, it does not currently support any graphics accelerators. Due to the low clock speed of
the FPGA (and thus of the OpenRISC processor) doing graphics in software is very slow. Instead, it is a
good idea to use specialized accelerated FPGA components to obtain high resolution graphics.
The goal of this thesis was to build an open source graphics acceleration component that can be connected
to a CPU, such as the OpenRISC processor. Beyond the basic capabilities of copying images from video
memory to the frame buffer, the core should be able to perform triangular and rectangular color fill and
line drawing operations. Additionally, the core provides acceleration of vector graphics. By setting up a few
parameters, the CPU can move a lot of serial computation from the CPU to dedicated hardware, running
one or more pipelines. This allows the CPU to spend more time on other calculations.
1.2
Thesis structure
After a brief review of related works and documentation of similar systems in section 2, some basic concepts
and terms used throughout the paper are explained in section 3. The actual body of the thesis is separated
into several sections; first the requirements on the system are presented in section 4, then the theory and
overall design of the solution is described in section 5. Finally, details about the hardware implementation
are explained in section 6.
Section 7 presents the software drivers and various utilities developed for the hardware implementation.
The environment used for testing and validation is explained in section 8, and the results of the tests are
presented in section 9. Finally, future work is presented in section 10 and the authors conclusions are
discussed in section 11.
2
Related works
Several open hardware display cores have been developed and are available on OpenCores, but very few
open source graphics accelerators. The VGA/LCD Controller core (see appendix B) which is used in the
example implementation of ORGFX, only supports displaying a section of memory on a monitor. The core
has no accelerated drawing operations, and the accelerator cores currently found on OpenCores can only
handle simpler 2D operations like line and rectangle drawing.
Modern consumer level graphics cards have a large number of highly configurable Graphic Processing
Units (GPU, as opposed to CPU). These processors are designed to perform a large number of similar
calculations in parallel, such as coloring and shading a set of pixels. Traditionally these processors are
specialized in performing graphics calculations, but there are many other applications that could benefit
from data centric parallel computation.
1 More
2 More
info about FPGAs is found in section 3.2.1
info about HDLs is found in section 3.2.2
4
One way to provide acceleration for a given calculation is to write a small program (called a shader) for the
GPU using one of the common graphic APIs OpenGL or DirectX. In this way advanced calculations adjusted
for parallel computing can be performed by the graphics card, leaving the main processor(s) free. Graphics
card developer Nvidia has released many articles on how to use their hardware to accelerate calculations
of different kinds. In their book GPU Gems 3 [7], they describe a way to accelerate rendering of vector
graphics on the GPU using programmable shaders.
Some non-graphics uses of GPU hardware are: physics calculations3 , medical simulations such as Folding@Home4 , encryption and decryption software([7], chapter 36) and much more. All the benefits of using
parallel hardware for GPU computations does of course apply to the more generic FPGA technology.
On the embedded market, ARM Holdings has its own GPU architecture called Mali 5 , while Imagination
Technologies has its slightly more powerful PowerVR 6 . Nvidia has also made an effort to reach the embedded
market with its Tegra platform 7 . All three vendors have support for Open Graphics Library for Embedded
Systems (OpenGL ES8 ), Open Computing Language (OpenCL9 ) and Open Vector Graphics (OpenVG10 ).
The disadvantage of all these implementations and many other similar ones is that both the hardware and
the software drivers are proprietary; there is no way for a user to make changes to the hardware or software
itself, only configure it using the provided interfaces.
The Open Graphics Project (OGP11 ) is an FPGA-based open hardware graphics card that has been under
development for a few years. The goal of the project is to build an open source graphics card for desktop
computers with full OpenGL support. The card is based on an FPGA and is connected to the computer
through the PCI port. A full software emulation has been implemented, but the hardware development has
stalled. OGP released a development board in 2010 named Open Graphics Development 1 (OGD1).
Another FPGA-based project of interest is the proprietary TurboVG. TurboVG is a vector graphics
accelerator that implements hardware acceleration for OpenVG [3]. While there is not much information
available about the system yet, it is a recently developed product with features very similar to those presented
in this thesis.
3
Concepts
This section contains a quick reference and background to some core concepts used frequently in this thesis.
It outlines the basic knowledge that the reader should have on these subjects to fully understand the rest of
the thesis.
3.1
Introduction to graphics
This section introduces the graphics-related terms used in this thesis.
3.1.1
Rasterized graphics
All displays used in the computer industry work at discrete resolutions. This means that if you look close
enough you will be able to see the pixels, the smallest image elements in the screen12 . If a screen works at
640x480 pixels at 16 bpp, that means that there are 640 horizontal pixels by 480 vertical, where each pixel
color is described by 16 bits of data.
A common way to store images are bitmaps, which are simply pixel buffers. Though these image buffers
can be scaled and rotated, the result is usually very jagged. While this can be improved by using different
filters to smooth the scaled image, the result is either jagged or blurry, especially at lower resolutions (see
figure 1).
Images consisting of a discrete number of pixels are referred to as rasterized graphics. See the next
section for a different way to store images that overcomes the drawbacks of rasterized graphics.
3.1.2
Vector graphics
The concept of vector graphics is that instead of storing every single pixel of an image, a mathematical
formula that describes the various shapes in the image is stored. Since displays are still made of pixel arrays,
the vector images still have to be rasterized when they are actually drawn to the screen. However, since the
image is described by vectors, it can be scaled and rotated before rasterization without any loss of detail.
3 Physics
for Nvidia hardware: http://www.geforce.com/hardware/technology/physx
4 http://folding.stanford.edu/
5 http://www.arm.com/products/multimedia/mali-graphics-hardware/index.php
6 http://www.imgtec.com/powervr/powervr-graphics.asp
7 http://www.nvidia.com/object/tegra.html
8 https://www.khronos.org/opengles/
9 http://www.khronos.org/opencl/
10 http://www.khronos.org/openvg/
11 http://wiki.opengraphics.org/
12 As a side note: modern TV sets usually make use of hardware smoothing algorithms to somewhat hide this fact. This is the
reason that hooking up your computer to a TV instead of a monitor can produce a blurry image.
5
Figure 1: A rasterized image in original size, scaled and finally rotated. No filtering was used when scaling and
rotating.
Figure 2: A vector image. Notice that the image has infinite detail even when scaled, and no pixelation artefacts
are visible.
One common use of vector graphics is TrueType Fonts, fonts that can be rendered to smooth text at any
resolution. For an example, see figure 2.
Another common use of vector graphics is Adobe Flash, which also demonstrates how vector graphics
can be used for animation through simple transformations of one image or shape, instead of storing multiple
images. By saving the state of the vector control points for a few keyframes, the computer can generate the
intermediate frames. The process is known as tweening, short for Inbetweening, and allows for very smooth
animations using small amounts of data.
3.1.3
Framebuffer
A frame buffer is a buffer that stores the content that will be written to the screen. The central processing
unit (CPU) or the graphics processing unit (GPU) write data to the frame buffer. The display hardware that
runs the screen then reads from the frame buffer and writes its content to the screen. A common problem
with framebuffers is that the CPU/GPU cannot write to the framebuffer while the display hardware reads
from it, and that the display hardware renders the image to screen before the CPU/GPU is finished. To
avoid flickering and delays double buffering is often implemented. With double buffering, there are two or
more frame buffers where one of the buffers is read by the display hardware and the other is written to by
the CPU/GPU. The two buffers are then swapped when the CPU/GPU is finished with a drawing.
3.1.4
Textures
A texture is an array of data that is used to store image data. Usually a texture is two dimensional, but both
one dimensional and three dimensional textures can be useful in graphics calculations. A single element of
a 2D texture is referred to as a pixel or texel, while a single element of a 3D texture is called a voxel.
Textures are commonly used to store not only color data, but also normal maps or bump maps (used for
two different shading techniques that are outside the scope of this thesis).
6
When this thesis report mentions textures, the term always refers to 2D bitmaps containing color data
(raster images).
3.1.5
Sprites
The term sprites is used in this thesis for images stored in device memory. A sprite can refer to an image or
a particular part of an image. An image that is a collection of sprites is often referred to as a sprite sheet.
The term is often used when referring to animated 2D characters in video games.
3.1.6
Fonts
A font is a collection of signs, letters or symbols that can be drawn to form a word or expression. There are
two common types of fonts; bitmap fonts, where every ”shape” is stored as a image, and the more common
vector fonts. Vector fonts are represented by mathematical formulas instead of rasterized images. For more
info see section 3.3.
3.1.7
Glyph
A font is a collection of glyphs, where each glyph represents one character, symbol or shape (for example,
the letter ’D’ or the symbol ’@’). In most fonts, each letter or symbol is represented by a glyph. In bitmap
fonts those glyphs are stored as small images, while in vector fonts they are stored as a collection of outlines.
3.1.8
Triangulation
In computer graphics the term Tessellation is used to describe the ability to fill a shape with sub shapes.
When you fill a geometric shape (such as a vector outline) with triangles it is called Triangulation.
3.2
Hardware terminology
This section will give you a brief introduction to some hardware terms used in this thesis.
3.2.1
FPGA technology
The Field Programmable Gate Array (FPGA) consists of logic units that can be connected together to form
a complex circuit. A Hardware Description Language (HDL) is used to describe how those logic units are
connected. Using FPGA technology have become increasingly more popular since it was invented in the
eighties. Some of the reasons for its popularity is that it can solve legacy and component shortage issues
and can reduce the time-to-market when developing new products. The two largest FPGA developers today
are Xilinx and Altera.
3.2.2
Hardware description languages
The two most common Hardware description languages (HDL) used today are Verilog HDL (Verilog) and
Very High Speed Integrated Circuit HDL (VHDL). The industry standard in Europe is VHDL which is
based on ADA, while Verilog with its C style syntax is the preferred language outside of Europe. Stephen
Bailey presents a comprehensive summary of the differences between the two languages in a white paper
from 2003[1]. This project uses Verilog, since all the surrounding components have Verilog implementations
while only a few of them have VHDL implementations.
The major difference from ordinary programming languages is that statements written in HDL are
executed in parallel rather than sequentially. This allows for higher data throughput on an FPGA than on
a CPU, even if the FPGA runs at a lower frequency.
3.2.3
IP Cores
A HDL component is called a Core or an IP. IP stands for Semiconductor intellectual property core, but is
most commonly named IP core or IP block. Three types of IP cores exists: hard cores, firm cores and soft
cores. The hard and firm cores are outside the scope of this thesis. The soft IP core is a component created
in a hardware description language that can be synthesized to run on a FPGA.
3.2.4
System-on-Chip
An FPGA based System-on-Chip, or SoC for short, is an embedded system built up from several IP Cores
and tied together with ”FPGA-glue”. A SoC usually has some sort of CPU that controls the system. Both
Altera and Xilinx provide tools to create SoCs from packaged IP cores without writing a single line of code.
7
3.2.5
Hard and soft CPUs
In System-on-Chip solutions, there are two different possibilities when it comes to choosing a central processor
for the system: hard and soft CPUs. A hard CPU is an integrated part of the FPGA chip that cannot be
changed (usually an ARM-based CPU). Hard CPUs are becoming more common in newer FPGA chips.
A soft CPU on the other hand is described by HDL, and can thus be implemented on any FPGA that
has enough gates. The performance and logic usage of a soft CPU can be impacted by other factors, such as
the availability of internal memory blocks or hardware multipliers. While soft CPUs are more versatile (any
number of them can be added to an FPGA design), hard CPU implementations can provide much higher
performance.
3.2.6
OpenRISC
OpenRISC13 is an open source 32-bit Harvard architecture soft Reduced Instruction Set Computer (RISC)
CPU IP Core. The OpenRISC project was started by Damjan Lampret, the founder of opencores.com. The
implementation used in this thesis was an OR1200 14 from opencores.com.
In addition to supporting the newlib and uClibc C implementations, the processor has been supported
by the Linux operating system since kernel version 3.1.
The OpenRISC implementation comes with a full compiler suite, both for building bare metal programs
that run directly on the CPU (or32-elf tool chain) but also for building Linux applications (or32-linux tool
chain). In addition to this, there is a fully compatible simulator that emulates the entire CPU.
3.2.7
Wishbone bus
Wishbone is an example of ”FPGA-glue” that is used to tie several cores together with a unified model of
communication. The Wishbone bus15 is an open bus protocol used in many open source designs, including
the OpenRISC processor. It can handle variable data and address widths, the most common being 32-bit
for both, and supports both reading and writing. Later revisions of the bus supports reading and writing in
bursts, allowing higher bandwidth usage.
3.2.8
ORPSoCv2
The OpenRISC Reference Platform System-on-Chip Version 2 (ORPSoCv2) reference platform is a SoC
which integrates an OpenRISC CPU, a VGA/LCD driver and several other useful components. This SoC is
used in the implementation phase of this thesis in order to test and validate the ORGFX core.
3.3
Vector Fonts
A vector font is a font that is described by outlines instead of discrete images. These outlines are represented
by a set of points that are connected with lines or Bézier curves. A more detailed explanation of Bézier
curves is given in section 5.5.
The main advantage of vector fonts over bitmap fonts is that they can be scaled, rotated or otherwise
transformed without any loss of detail. What makes vector fonts difficult to render is the interaction of
several outlines. One outline contained within another can signify that the shape has a hole in it. Thus, the
entire glyph must be considered as a whole instead of handling the outlines one by one.
In addition to this, many font formats use clever tricks such as implicit points to reduce the file size. An
example of an implicit point can be found between point one and two in figure 3 where the point without a
number is an implicit point.
3.3.1
TrueType fonts
The TrueType font format is one of the most common font formats. The format stores points that can form
lines or quadratic Bézier curves. The points can be either on-line points or off-line points. If two on-line
points are stored after each other, a line will be drawn between them. If an on-line point is followed by an
off-line point, the next point is checked. If the next point is an on-line point, a Bézier curve will be drawn
between the first and last point with the middle point as a control point. However if both the second and
third point are off-line points, there exists an implicit point between them that have to be calculated with
the midpoint formula:
x0 + x1
y0 + y1
x=
,y =
2
2
The TTF format does not store any information about if a shape is filled or if the shape is a hole in
another shape. Therefore all the shapes have to be analysed before deciding what pixels to be filled. The
order in which the points in a shape are defined indicates what type of shape it is. A clockwise defined
shape indicates a filled shape and a counter clockwise defined shape indicates a hole. Figure 3 displays in
what order the two shapes in the letter ”D” is defined. The outer shape is defined in clockwise order and is
13 http://opencores.org/openrisc,or1200
14 http://opencores.org/svnget,or1k?file=/trunk/or1200/doc/openrisc1200
15 http://opencores.org/opencores,wishbone
8
spec.pdf
therefore filled while the second shape is defined in counter clockwise order and is therefore not filled. The
TTF definition calls the rule that decides what pixels to draw or not to draw as the ”winding rule”. The
winding rule states that a point is filled as long as a line from the point towards infinity does not cross the
equal number of clockwise and counter clockwise defined outlines16 .
3.3.2
PostScript fonts
Postscript fonts are simular to TTF fonts but they use Cubic Bézier curves instead of quadratic Bézier
curves. The current implementation of the ORGFX does not support cubic Bézier curves and therefore
there is no hardware acceleration for postscript fonts.
3.3.3
OpenType fonts
OpenType is an extension of freetype fonts and an opentype font can be stored in two modes, either as a
TTF font or a PostScript font. OpenType fonts are not supported by ORGFX because as with PostScript
fonts it demands support for cubic Bézier curves.
3.3.4
FreeType
FreeType is a open source library for handling fonts. FreeType have support for TTF, PS, OT and a wide
range of other more or less common font formats. In this thesis FreeType is used to read the TTF files and
extract the points inside the Glyphs.
3.4
Linux and free Software
According to the free software foundation an application is defined as free if:
The users have the freedom to run, copy, distribute, study, change and improve the software17 .
When software is released to the public it often contains a free software license that explains what you are
allowed to do with the software. There exists several free software licenses, some a more restrictive than
others. The most common open source license today is the Gnu Public License (GPL).
3.4.1
GPL
The GNU General Public License (GPL) is a free software license that allows for editing and redistribution,
as long as the original author gets credit for his/her work and the changes to the code and all new code that
is integrated with the original code is released to the community.
3.4.2
LGPL
The GNU Lesser General Public License (LGPL) is a lesser strict version of GPL, it allows a project to
include LGPL code without having to release the project as LGPL. However modified open source code
must still be released to the community. Some companies prefers LGPL because it integrates more easily
with proprietary code.
3.4.3
Linux
Linux is a free Unix-based operating system commonly used almost everywhere from dishwashers to supercomputers. The operating system gains more and more popularity each year and have lately had a
breakthrough on the cellphone market with Googles Android. Linux have long been popular among developers and scientists, but not until lately found its way down to the common user.
3.4.4
Drivers
A driver is an application that tells the operating system how to handle a physical device. In Linux there
are two types of drivers: kernel space drivers and user space drivers. A kernel space driver is compiled as
an extension or ”module” to the Linux kernel. The driver is commonly loaded during boot, but can also be
loaded on the fly during runtime. User space drivers are simpler to implement but lack the ability to utilize
interrupts and other kernel features.
3.4.5
DirectFB
DirectFB is a hardware abstraction layer that allows for hardware acceleration on embedded Linux systems. DirectFB can be run atop of the standard Linux framebuffer driver and add hardware acceleration.
Each feature in DirectFB has a software implementation to allow for full compatibility without hardware
acceleration. DirectFB is popular among embedded systems with limited hardware.
16 More
details on the TTF font format can be found in the specification at https://developer.apple.com/fonts/TTRefMan/
17 http://www.gnu.org/philosophy/free-sw.html
9
Figure 3: A visualisation of the glyph ’D’ from a TTF font. Dots are explicit on-line points, crosses are off-line
points and circles are implicit on-line points.
10
3.4.6
X-Server
The X-server is the standard graphics manager for Linux and Unix. It provides unified graphics API,
allowing the same source code to be compiled on different computers with different hardware. It has been
under active development since 1984.
3.4.7
KMS
Kernel Mode Setting (KMS) is used to set the screen resolution and color depth in kernel space. This has
the benefit that the screen mode can be set during early boot. This allows fancy graphics during boot and
a smoother integration of virtual terminals. The KMS can be accessed at the same time as the DRM.
3.4.8
Direct Rendering Infrastructure
Direct Rendering Infrastructure (DRI) is a framework used by the X-Server to interface directly with graphics
hardware. One of the benefits of DRI is that it allows for a faster OpenGL implementation in X.
3.4.9
Direct Rendering Manager
The Direct Rendering Manager (DRM) implements the communication interface to the hardware. It is the
DRM that reads and writes registers on the graphics accelerator. The DRI uses the DRM to get access to
the hardware.
4
Requirements
The goal of the ORGFX project was to develop a generic open source graphics accelerator that can be used
in modern embedded systems. The accelerator should be able to provide 2D and vector graphics rendering
without adding substantial load on the host CPU.
It will be possible to integrate the accelerator with a standard open source CPU through a standard bus
interface. The target platform was the OpenRISC processor and the Wishbone bus, though the device was
implemented to be generic enough to adapt to other bus interfaces.
To meet the rendering requirements a basic feature set was constructed:
• 2D engine: Color fill (rectangle), draw line, render texture (memory copy).
• Vector engine: Quadratic Bézier curves. Filled Quadratic Bézier shapes.
To have 2D features, filling areas with color and copying memory is basic features expected of any graphics
accelerator. In addition to the above, a few simple blending and clipping operations need to be supported
(alpha blending for half-transparent draws and colorkeying for rendering images with transparency).
The vector engine features were requested by ORSoC, and can potentially be used to render vector
graphics and vector fonts.
To make the ORGFX easy to use, a stable and efficient software layer is needed. The software layer
enable both detailed control of the hardware and more complex functions that reduce the number of API
calls needed to perform common tasks. To make it possible to use images and fonts the API supports direct
loading, or provide conversion tools for such files.
The graphics accelerator has no hard real-time requirements or framerate requirements.
5
Design
The main focus of the ORGFX core was to provide a graphics interface for the OpenRISC processor, the
component itself is platform independent and can be connected to anything with a wishbone bus interface.
With minor adaptations it is possible to adapt the core to another bus interface. During the development
of the ORGFX a 3D engine was added.
This section outlines some design concepts common to all features, and describes the theory behind each
feature in the feature set. The focus is to explain the purpose of each feature and present one or several
algorithms that can be used to implement the feature. Each feature is considered individually, but parallels
to other similar features are drawn when possible.
Section 6 proceeds to explain how the architecture of ORGFX is implemented to realize all these features.
5.1
Display control
The ORGFX core is not a display component, it works by interfacing with some form of video memory and
writing graphics primitives to it. The core has no interface that can produce VGA, HDMI or any other form
of display signals.
11
Figure 4: The right-handed Device coordinate system. The Z-axis points into the surface, so the higher the
Z-value the farther a point is from the ”camera”.
To display the information written by the graphics accelerator, a certified core from Opencores called the
VGA/LCD controller18 was used (see appendix B for specification). No changes were made to the original
design of the VGA/LCD core.
A side note: Hardware acceleration of multiple layers is a common technique used in early game consoles
to minimize overdraw. While there is no technical limitation in the graphics accelerator to provide several
different hardware layers, the design of the display controller prevents this. By using a display controller
able to handle multiple layers, the ORGFX core could provide this functionality without any modifications.
5.1.1
Render target
The areas in memory that the ORGFX core can access and render pixels to are known as render targets.
In the special case that the render buffer is the same buffer that will be drawn to screen it is sometimes
denoted as the framebuffer. The ORGFX core supports switching back and forth between different render
targets. These memory areas can be of any resolution (limited by the hardware to a maximum size of 65536
by 65536 pixels).
Double buffering can be achieved by simply alternating between two framebuffers, rendering to one while
showing the other on screen.
5.1.2
Device coordinate system
ORGFX uses a right-handed device coordinate system that is based on screen coordinates for simplicity.
Each unit is one pixel, and the origin is placed in the top left corner of the surface. The X-axis increases
when moving to the right, and the Y-axis increases when moving down. Finally, the Z-axis points into the
surface. In other words, the far end of the depth buffer is at the largest possible positive Z-value, and the
closest to the viewer is at the largest possible negative Z-value. This is visualized in figure 4.
Internally, coordinates are handled as fixed point numbers, with the default precision being set to 16 bit
integer part and 16 bit fractional part. The fixed point architecture makes the implementation of the device
much simpler and smaller than if floating point numbers were used, at the loss of precision.
5.1.3
Texture coordinate system
Another important coordinate system is the texture coordinate system. Textures in ORGFX are just 2D
images, so there is no depth coordinate. The X and Y axis are renamed U and V (a standard graphic
convention for textures), but have the same direction (top-left corner is the origin, one U unit is one pixel
18 http://opencores.org/project,vga
lcd
12
Algorithm 1 Rasterization algorithm for rectangles
for y = p0 .y to p1 .y do
for x = p0 .x to p1 .x do
Put pixel (x, y)
end for
end for
in the image). Unlike the regular coordinate system, texture coordinates are only stored as 16 bit integers,
without a fractional part.
5.2
Control interface
The ORGFX device has a set of registers that hold the device state and influence how the core operates. To
keep a consistent device state even when the device is busy doing operations, register writes are stored in a
circular First In First Out (FIFO) queue. Since all drawing operations most likely take at least a few clock
cycles, it is important to prevent the device state from changing during an operation. By storing writes in
the FIFO and only allowing the FIFO to be read from when the device is not busy, the device can be kept
in a stable state. To prevent this FIFO from overflowing during very long drawing operations, the software
should only write to the FIFO if the FIFO is not full.
The ORGFX contains a large number of registers that can be written to. Of all the registers on the
ORGFX only the control register can start drawing operations. The ORGFX is put in a busy state when
drawing.
5.3
2D engine features
The 2D engine can draw various graphic primitives and perform memory copy operations from either the
texture memory or the framebuffer. The basic feature set contains support for:
• 16 bit color depth mode
• Variable resolution
• Acceleration of rectangle, line and triangle raster operations
• Acceleration of memory copy operations
• Saving textures to video memory
• Clipping/Scissoring
• Alpha blending and colorkeying
All rendering operations will apply to the current render target, which can either be a texture in memory
or the visible screen. The graphics accelerator does not differentiate between rendering between a texture
and rendering to the screen.
5.3.1
Color depth modes and variable resolution
Color depth modes and variable resolution can cause several problems. The color depth of a surface ties in
closely with how the display controller interprets the data in memory. The way that pixels align to memory
addresses can further complicate supporting different color depth modes. Finally, having too large resolution
and color depth on the framebuffer can lead to bandwidth issues.
Internally, the device only knows of the current render target and its size (additional render targets have
to be stored in software). A render target is represented as a base memory address and the width and height
in pixels. The size of the render target is needed so that operations do not write outside of the current
render target, and additionally so that the correct stride is applied (since surfaces are stored serially, one
”row” of a surface will have a different memory offset depending on the width). There is no real limit to the
size a render target can have, other than the size of the registers holding the width and height values.
The color depth affects how the pixel data is packed in memory. Using 16 bits for each pixel gives both
a decent color range and is kind on memory bandwidth. 24 bit color mode does not tile well in a 16/32 bit
memory, to allow for 24bit color depth, additional logic for alignment and memory management is needed.
One way to implement 24 bit color mode is to use 32 bit mode and ignore the last 8 bits. This method is
not supported by the display driver and is therefore not used.
5.3.2
Rectangles
Filling rectangles of pixels in video memory is accomplished by iterating over each pixel and writing it to
memory. The rasterization algorithm is presented in Algorithm 1 and illustrated in figure 5.
13
p0
p1
All pixels in the rectangle have
to be traversed.
Figure 5: Rasterization of a rectangle.
Figure 6: Image of a circle with eight octants and how octant 2 to 8 can be transformed into the first octant.
5.3.3
Lines
The ORGFX core implements a line drawing module capable of drawing a line between two arbitrary points.
The current implementation is based on Bresenham’s line algorithm [2]. This particular algorithm was chosen
for its iterative nature, which makes it easy to implement on an FPGA. Algorithm 2 describes the flow of
the algorithm. This algorithm only works for the first octant. The input is therefore transformed to the first
octant then calculated and finally transformed back to the original octant. One example of this is when the
Y axis increases faster then the X axis (second octant), the X and Y axis are then switched, calculated and
finally switched back. The table below and figure 6 shows how the different octant’s are transformed. See
figure 7 for an example of a line drawn using the algorithm.
Octant
1
2
3
4
5
6
7
8
Switch X and Y
X
X
Negate X
X
X
X
X
X
X
Negate Y
X
X
X
X
An alternative line drawing algorithm is presented by Rokne in [8], usually known as Xiaolin Wu’s line
algorithm. It provides speed improvements of a factor 4 to the rasterization over Bresenham, and also allows
anti-aliased lines. However, due to the structure of the pipeline 19, the ORGFX core would not become
significantly faster unless parallel pipelines were added.
14
Algorithm 2 Rasterization algorithm for lines (Bresenham)
∆x ← p1 .x − p0 .x
∆y ← p1 .y − p0 .y
← ∆x − 2 ∗ ∆y
y ← p0 .y
for x = p0 .x to p1 .x do
Put pixel (x, y)
if < 0 then
y ←y+1
← + 2 ∗ ∆x − 2 ∗ ∆y
else
← − 2 ∗ ∆y
end if
end for
Figure 7: Example rasterization of a line using Bresenham.
5.3.4
Triangles
Another feature of the ORGFX graphics accelerator is to render triangles. Two different algorithms were
considered.
A triangle can be described by three lines connected by three points. The equations of the lines can be
calculated from the three points. Once the line equations are known, it is possible to iterate over the pixel
spans between the lines.
While this algorithm 19 always iterates over the least number of pixels possible, it is not without problems.
Because the algorithm uses the slope of the lines, there will be problems when the slope is very small
(subpixel differences). In an early prototype of the algorithm, rendering artefacts appeared, and due to this
the algorithm was discarded.
An alternative approach is presented in [6] and expanded on in [10](page 5-7). It can be calculated if a
given pixel is inside a triangle or not by evaluating the pixels position relative to the triangles three edges.
By calculating on which side of the three edges a point resides, it can be calculated if a given pixel is
inside the triangle or not.
edge0 = −(p2y − p1y )(x − p1x ) + (p2x − p1x )(y − p1y )
The sign of the result denotes on which side of the edge a point is located. With the equations above, if
all edge functions are positive, the pixel is fully inside the triangle (see figure 8).
For the full algorithm, see Algorithm 3.
The main disadvantage of the algorithm is one of speed: it has to iterate over every pixel in a rectangle,
where only some – at most half – of the pixels are actually rendered. The problem is illustrated in (figure
9a). A simple speed-up can be added to the algorithm to lessen the problem somewhat (9b). Given that
the algorithm iterates over the body of a triangle and suddenly hit a ”no-draw” pixel, this means that no
more pixels will be drawn this row, and it can be skipped completely.
As can be observed, this approach adds a lot of overhead from the ideal case presented in the first
algorithm. With the second algorithm, barycentric coordinates (see section 5.4.2) can be calculated from
the edge functions and the triangle area.
19 http://joshbeam.com/articles/triangle
rasterization/
15
Figure 8: Visual representation of the triangle edge functions. The sign of the function for each pixel indicates
if the pixel is inside the triangle or not. Picture from [10]
.
b) Using the speed up
technique, many pixels can be
skipped (ﬁlled).
a) All pixels in the rectangle
have to be traversed.
Figure 9: A demonstration of how the speed up technique for drawing triangles leads to iterating over fewer
pixels.
16
Algorithm 3 Rasterization algorithm for triangles
xmin ← min(p0x , p1x , p2x )
ymin ← min(p0y , p1y , p2y )
xmax ← max(p0x , p1x , p2x )
ymax ← max(p0y , p1y , p2y )
for y =ymin to ymax do
for x = xmin to xmax do
edge0 ← −(p2y − p1y )(x − p1x ) + (p2x − p1x )(y − p1y )
if edge0 > 0 and edge1 > 0 and edge2 > 0 then
Put pixel (x, y)
end if
end for
end for
Figure 10: 1. Texture, 2. Source, 3. Render target, 4. Clip, 5. Destination
The very idea to implement triangles might seem out of the scope of the 2D engine at first, but it will
be shown that by implementing triangles using barycentric coordinates, much of the groundwork for the 3D
engine and the vector engine is already finished. See section 5.4.2 for further deliberation.
5.3.5
Clipping
All pixels generated by the various raster operations are checked against a clipping rectangle (see number 4
in figure 10). If a pixel falls outside the clipping rectangle it will not be rendered, and it is discarded from
the pipeline. This technique is sometimes known as scissoring, and can be enabled or disabled with a flag.
Any pixels that fall outside of the active render target (see number 3 in figure 10) should always be
discarded, regardless of if clipping is enabled or not. This is to prevent drawing operations to one buffer to
fall over into another buffer.
5.3.6
Coloring
Once a pixel coordinate has been generated (by a rectangle, line or triangle draw operation), the next step
is to decide what color the pixel should have. There are several possible ways to do this:
• Use a flat color for the entire shape.
• Generate a color based on a gradient. This is expanded on in section 5.4.2.
• Fetch a color from texture memory.
The last technique is sometimes referred to as bit block transfer, or blitting. When this technique is
applied to an entire rectangle, it can be used to copy an image from one place to another in memory. In
practice, one would store images somewhere in memory, then fetch them and draw them to the render target
as needed. This covers the Acceleration of memory copy operations feature.
A sprite can be loaded into the texture memory by the CPU. When the sprite is in the texture memory,
the ORGFX can draw it to the active render target by copying it pixel by pixel in hardware. This approach
requires less CPU time compared to drawing the sprite pixel by pixel every frame. This covers the Saving
textures to video memory feature.
A comparison of the three rendering modes is shown in figure 11.
17
b)
a)
c)
Figure 11: Three different color rendering modes: a) Flat. b) Interpolated gradient. c) Textured.
Figure 12: The same image rendered without colorkeying and with colorkeying. Both images are rendered
against a white background.
5.3.7
Color keying
The term color keying refers to rendering images with transparent patches to screen, such as many 2D video
games do. This technique consists of picking a specific color in the image to be the color key. Whenever
a pixel of this color is encountered it is considered to be fully transparent and is then discarded. For an
example, see figure 12.
This method only applies to operations using the textured coloring method described in the previous
section.
5.3.8
Alpha blending
A more complex form of transparency can be achieved through alpha blending[9]. By providing an alpha
value between zero and one, the active pixel can be drawn as fully transparent, fully opaque or something
in between. In practice, this is achieved by sampling the background color from the target pixel and mixing
this with the pixel to be drawn:
alpha = alphaglobal ∗ alphapixel
colorout = colorin ∗ alpha + colortarget ∗ (1 − alpha)
where alpha is a value between 0 (transparent) and 1 (opaque). If alpha blending is disabled the pixel is
passed on unmodified. The alpha value can be interpolated over a triangle to create gradients (see section
5.4.2). If this function is turned off (interpolation is disabled on triangle draws) then alphapixel is set to 1.
The global alpha parameter is a separate value that can set the overall alpha of an entire drawing primitive
and is applied to all pixels if blending is enabled. The interpolated alpha only applies to triangle and curve
renders.
For an example of the result of an alpha blending operation, see figure 13.
18
Figure 13: The same image rendered with different global alpha values (from left to right: alpha = 100%, alpha
= 70%, alpha = 30%). The interaction with the background text shows how the alpha settings change the
blending. The image is also colorkeyed.
5.4
3D engine features
The 3D enigne in the ORGFX is designed to have support for the following features:
• Hardware vector transformations
• Interpolation
• Depth buffer culling
Those features will be discussed in detail in this section.
5.4.1
Transformations
When working with large 3D objects built from a set of points, a common operation is to apply a matrix
multiplication to all the points, creating a common transformation. The equation in its simplest form is as
follows:
pointout = T ransf ormation ∗ pointin
This can for example represent how an object is rotated, by applying a simple rotation kernel:


 0  
cos(α) −sin(α) 0
x
x
0
 y  =  sin(α)
cos(α)
0  y 
z0
0
0
1
z
This transformation rotates the input point by α around the Z-axis. By extending the 3x3 transformation
matrix to a 3x4 matrix, it is possible to not only rotate, but also translate a point in the same step. Expanded,
the generic calculation looks like this:


 0  
 x
x
aa ab ac tx 

 y 0  =  ba bb bc ty   y 


z
z0
ca cb cc tz
1
The components aa through cc describes the combined scaling and rotation, while the vector tx, ty, tz
describes the translation. Elaborating the expression creates the following equations:
x0 =
y0 =
z0 =
aa ∗ x
ba ∗ x
ca ∗ x
+ab ∗ y
+bb ∗ y
+cb ∗ y
+ac ∗ z
+bc ∗ z
+cc ∗ z
+tx
+ty
+tz
At this point it is a good idea to step back and consider this. For each point in a 3D model (which
can easily contain thousands of points), the same set of multiplications and additions have to be performed.
This common operation will be a severe load on the CPU, so a large leap in performance can be gained by
moving it to hardware. Additionally, in hardware the parallel nature of the FPGA can be used to perform
the entire transformation in a fraction of the amount of clock cycles needed by the CPU.
19
5.4.2
Interpolation
This section expands on the triangle drawing theory from section 5.3.4.
The ORGFX is designed to have hardware accelerated bilinear interpolation of triangles. This is achieved
by calculating the Barycentric coordinates[6][10] of each pixel rendered. The Barycentric coordinates are an
indication of how close to the corners of the triangle each pixel is. Each factor is in the range between 0 and
1, and the sum of all three factors is always 1.
Once the barycentric coordinates have been calculated, they can be used to interpolate many different
variables for the triangle. The one most interesting for the 3D engine is interpolated depth. The user sets
the depth value for each corner of the triangle, and the factors are used to get a smooth interpolation of the
depth value over the entire triangle.
Recall the edge calculations described in the triangle rasterization algorithm:
e0 (x, y) = −(p2y − p1y )(x − p1x ) + (p2x − p1x )(y − p1y )
Additionally, the signed area of the triangle is needed:
1 1
((px − p0x )(p2y − p0y ) − (p2x − p0x )(p1y − p0y ))
2
As described in the papers mentioned, the Barycentric coordinate factors can be calculated with the
formula below:
e0 (x, y)
f actor0 =
2A∆
e1 (x, y)
f actor1 =
2A∆
e2 (x, y)
f actor2 =
2A∆
A∆ =
Since f actor0 + f actor1 + f actor2 = 1 one of the divisions can be omitted:
f actor2 = 1 − f actor0 − f actor1
Using these factors it is simple to interpolate many different variables for a given pixel in a triangle using
the formula below:
z 0 = f actor0 ∗ z0 + f actor1 ∗ z1 + f actor2 ∗ z2
In this example the depth value at a given pixel in the triangle is interpolated from the depth at each
control point and the calculated factors. The same calculation can be used to find the interpolated texture
coordinates, alpha value and color of a pixel. For an example of the interpolation technique in action, see
figure 11b (interpolated colors) and 11c (interpolated texture coordinates).
5.4.3
Z-buffer culling
When drawing shapes with different depth in a 3D environment, the order of drawing objects suddenly
becomes important. A shape that is ”farther away” from the viewer than a shape already drawn to screen
may end up overwriting the first shape. To prevent this behaviour, a separate buffer containing depth values
is held in memory. Whenever a pixel is being drawn to the render target, the depth value of the pixel is
compared to the current depth at that point. If the depth is less than the current value, the depth is updated
and the pixel is rendered. If the depth being rendered is greater than the current value (the pixel is behind
an object in the scene) the pixel is discarded.
This feature is vital when rendering any form of complex 3D graphics scenes. For an example of why the
depth buffer is needed, see figure 14.
5.5
Vector engine features
The Vector engine is designed to be able to perform rasterization of filled quadratic Bézier shapes. This
section will discuss what features the vector engine needs in order to raster those shapes.
5.5.1
Path theory
The main advantage of vector graphics is that objects can be rendered with infinite detail. Instead of storing
an image as an array of pixels, shapes are described using something called parametric curves.
The theory of these parametric curves was developed in 1959 by Paul de Casteljau, and later popularized
and patented by Pierre Bézier. The main use for the curves was to describe hulls of cars in CAD programs.
Their use has expanded considerably since then, and today Bézier curves are also used to describe scaleinvariant fonts and vector graphics. A few common vector graphics formats include Postscript, PDF, flash
and SVG. The most widely used format for vector fonts is Truetype fonts (TTF). Bézier curves are also used
to describe interpolated change, for example when describing animations.
20
Figure 14: The image above shows the following scenario. A camera is looking at two objects: a box and a
person behind the box. The box (1) is rendered first, and the person (2) is rendered second. Without Z-buffer
culling, the result will be as is seen on the left (the person appears to be in front of the box). With the correct
culling active, the parts of the person behind the box will fail the depth test, and be discarded. In the right
image, the person appears to be behind the box, even if the box was rendered first.
Because Bézier curves are described as a series of points, it is possible to perform transformations such
as rotations and scaling before the curve is rasterized, without any loss of detail.
The formula for linear, quadratic and cubic Bézier curves are presented below:
Linear:
BP 0,P 1 (t) = (1 − t)P0 + tP1 , where t ∈ [0, 1]
Quadratic:
BP 0,P 1,P 2 (t) = (1 − t)BP 0,P 1 + tBP 1,P 2 , where t ∈ [0, 1]
Cubic:
BP 0,P 1,P 2,P 3 (t) = (1 − t)BP 0,P 1,P 2 + tBP 1,P 2,P 3 , where t ∈ [0, 1]
The same recursive pattern can be further expanded to get n-dimensional Bézier curves. For some
example Bézier curves, see figure 15.
One notable disadvantage of Bézier curves is their inability to describe a perfect circle or a circle arc.
Because of this, most systems capable of drawing vector graphics with Bézier curves have a special case for
drawing circular shapes. This feature was not considered for the ORGFX graphics accelerator because of
time constraints.
The Bézier curve formula can be extended to describe surfaces instead of curves, allowing for scaleinvariant three dimensional shapes.
5.5.2
Shape implementation
ORGFX only supports one particular case of Bézier curves; filled quadratic Bézier shapes. This feature is
enough to describe all quadratic and cubic vector fonts, with the correct preparations.
Quadratic Bézier curves are parametric curves. A parametric curve can be described as a second degree
implicit curve (often referred to as a conic section).
Quote from a paper by C.Loop[5]:
Claim: Any rational quadratic parametric curve has an implicit form that is a projected image
of the algebraic curve
f (u, v) = u2 − v
The mathematical proof for this claim is outside of the scope of this thesis, but can be found in the
paper. What it means is that by interpolating the coordinates u and v over a rasterized triangle, the values
21
Figure 15: The top image shows a quadratic Bézier curve starting at p0 and ending at p2, where the curvature
is adjusted by p1. The bottom image shows a cubic Bézier curve, starting at p0 and ending at p3, where the
curvature is adjusted by p1 and p2.
Figure 16: The canonical quadratic curve element (left), a triangle formed by the control points of a quadratic
Bézier curve (right). Image from C.Loop[5].
can be tested against the formula. If f (u, v) < 0 then the pixel is inside the curve, otherwise it is outside[5].
See figure 16 for an example of this. Note that the use of the term texture space refers to the way that Loop
et al implements this rendering technique using texture coordinates on a programmable GPU, and is not
connected to the use of textures or texture coordinates in this thesis.
This approach to shape rendering can easily be implemented on top of the interpolation module previously
described in section 5.4.2.
For example filled shapes, see figure 17.
5.5.3
Alternative approaches
The first attempt at implementing Bézier curves consisted of making a parallel implementation of de Casteljau’s algorithm. It was pretty easy to find the correct coordinates of any point in the Bézier curve in linear
time. The difficult part was to find the correct step size of the interpolation variable. In fact, depending on
the arrangement of the control points, it is entirely possible that the ”correct” step size is not constant over
the curve.
Tests and experiments with the algorithm show that if the step size is too small there will be significant
overdraw, which will lead to significant rendering artefacts when alpha blending is enabled. If the step size
is too big, the Bézier curve will have gaps in it. This can be somewhat reduced by either using line draws
between the calculated points (for a Bézier curve) or by filling triangles (for a Bézier shape). The problem
is that this still leads to jagged shapes.
This approach to drawing Bézier shapes was dropped in favour of using the method described by Loop[5],
due to the accuracy problems.
22
Figure 17: Above are two different ways to render the same Bézier shape within the bounds of the triangle
defined by the three control points.
5.6
Software
In addition to the hardware design of the device, a functional software layer is needed to properly interact
with the device. This section explains the basic design of how the software communicates with the device,
as well as the data structures used to abstract some graphics operations.
For the example implementation, the software runs on a 32-bit OpenRISC processor with no operating
system.
5.6.1
Bus interface
The example implementation of the software assumes that the device is connected the CPU data bus, and
thus can be accessed by writing to and reading from specific memory addresses. The data bus is shared
with many other devices, so the software layer must know the base address of the device, in addition to the
address offset of the specific register to be accessed.
Below is an example of how this can be implemented in C using defines for the specific addresses and
a macro for mapping memory. After these declarations follows example usage of how to write a value to a
register and read from a register:
#d e f i n e GFX BASEADDR
0 xB8000000 /∗ Bus Adress t o GFX
#d e f i n e GFX STATUS
#d e f i n e GFX COLOR0
(GFX BASEADDR + 0 x04 )
(GFX BASEADDR + 0 x84 )
#d e f i n e REG32( add )
∗(( v o l a t i l e unsigned i n t
∗/
∗ ) ( add ) )
...
REG32(GFX COLOR0) = 0 x f 8 0 0 ;
s t a t u s = REG32(GFX STATUS ) ;
For a full list of registers and their addresses, refer to the ORGFX device specifications in appendix A.
For consistency, all registers defined in software have the same name as their hardware counterpart.
5.6.2
Surfaces
Since the hardware itself only knows the address and size of the current render target and active texture,
the software must keep track of many such surface objects to be able to switch between them freely. The
bare minimum information needed for this is the base address of the surface and the width and height in
pixels.
With the following structure, these parameters can be stored together:
struct orgfx
{
unsigned
unsigned
unsigned
surface
i n t addr ;
i n t w;
int h;
23
};
By passing this structure to a bind function, the correct values can be loaded to the hardware. By
design decision, it is up to the user to manage the surface structure.
5.6.3
Meshes
A mesh is nothing more than a collection of triangles drawn around the same origin point. Each triangle can
be thought of as a face that contains three vertexes and three texture coordinates. It is relatively common
that those coordinates are shared by other faces too, so it is possible to save a lot of space by just storing
the indices that each face uses.
typedef struct orgfx point2
{
float x, y;
} orgfx point2 ;
typedef struct orgfx point3
{
float x, y, z ;
} orgfx point3 ;
typedef struct
{
unsigned i n t
unsigned i n t
unsigned i n t
} orgfx face ;
orgfx face
p1 , p2 , p3 ;
uv1 , uv2 , uv3 ;
color1 , color2 , color3 ;
typedef s t r u c t orgfx mesh
{
u n s i g n e d i n t numVerts ;
orgfx point3 ∗ verts ;
u n s i g n e d i n t numUvs ;
o r g f x p o i n t 2 ∗ uvs ;
u n s i g n e d i n t numFaces ;
orgfx face ∗ faces ;
} orgfx mesh ;
5.6.4
Fonts
Vector fonts can be described as a set of glyphs, each a number Bézier shapes that form the curved exterior
and a number of triangles that fill the interior of the shape. This way of describing vector fonts is designed
with the implementation of hardware Bézier shapes in mind (see section 5.5.2).
Each Bézier shape representation needs three 2D points describing the shape, as well as a flag indicating
if the shape should be filled as inside or outside (see figure 17).
typedef struct Bezier write
{
orgfx point2 start ;
orgfx point2 control ;
o r g f x p o i n t 2 end ;
int f i l l I n s i d e ;
} Bezier write ;
typedef struct Triangle write
{
o r g f x p o i n t 2 p0 ;
} Triangle write ;
t y p e d e f s t r u c t Glyph {
i n t advance x ;
i n t index ;
int bezier n writes ;
Bezier write ∗ bezier ;
int triangle n writes ;
24
Triangle write ∗ triangle ;
} Glyph ;
typedef struct orgfx vector font {
int i n d e x l i s t s i z e ;
Glyph ∗∗ i n d e x l i s t ;
int size ;
Glyph ∗ glyph ;
} orgfx vector font ;
To be able to support unicode fonts the software layer uses wide character strings. This makes it possible
to write strings that contain letters not included in the basic 128 ASCII set. This includes characters such
as åäö, and other alphabets such as the Arabic, the Cyrillic and the Chinese character sets.
Below is a piece of example code in C that shows how wide character strings can be used. Note that
constant wide strings must be prefaced with a capital L.
#i n c l u d e <wchar . h>
w c h a r t w i d e s t r i n g [ ] = L” This i s a wide s t r i n g ” ;
6
HDL implementation
The hardware implementation of the algorithms from the previous section is presented here. Before diving
into the architecture of the ORGFX device, the development board used for the implementation and several
important IP cores used are presented.
6.1
Development board
A Digilent ATLYS development board (see figure 18) was used during this thesis. The ATLYS board has a
Xilinx Spartan 6 FPGA and 1 Gbit of DDR2 SDRAM. The board has four HDMI, two USB ports, Ethernet
and audio connectors, some push buttons, several LEDs and switches, as well as a GPIO port. For more
information about the board see the AtlysTM Board Reference Manual 20 .
For the purpose of the ORGFX implementation, the only components actually needed on the board is
the FPGA, the memory and the HDMI connector (including the surrounding Integrated Circuit logic).
All the modules on the FPGA run on a 50 MHz clock.
6.1.1
Video Ram
There is only one larger RAM chip on the Atlys board (128MB in size), so the RAM is shared between the
CPU and the graphics accelerator. The graphics accelerator can easily switch to using a different memory
because of the generic wishbone interface. A dedicated graphics memory may allow for larger resolution and
better performance.
6.1.2
Display core
The display driver used in this project is the Enhanced VGA/LCD controller 21 . This component is connected
to the system with a Wishbone revB.3 data bus and is widely used in other projects (for example: it is the
main display core used in ORPSoCv2). The specification for this core is provided in Appendix B.
6.1.3
HDMI converter
Since the display controller core generates VGA signals, some modifications have to be made before the
signal can be forwarded to the HDMI port. The VGA signal passes through another core that interfaces
directly with an HDMI converter chip present on the Atlys board.
6.2
Architecture
The ORSoC Graphics Accelerator core is designed to reduce CPU load by undertaking expensive graphical
operations.
The core has a pipeline structure so that it performs several pixel operations in series in an efficient
manner. For some simpler operations, some steps in the pipeline are skipped completely for a shorter
operation latency (for example: the blending step is not needed if the rendered pixel has no transparency).
See figure 19 for an overview of the various submodules.
20 http://www.digilentinc.com/Data/Products/ATLYS/Atlys
21 http://opencores.org/project,vga
rm.pdf
lcd
25
Figure 18: Picture of the Digilent ATLYS development board.
While all operations use the same pipeline (see section 6.2.4), several steps are skipped or simplified when
only doing 2D operations.
This modular pipeline architecture was chosen for several reasons:
• Several actions (mostly operations on individual pixels) can be queued, trading low latency for a higher
throughput.
• Several similar operations can be combined into one module and modified through flags. This can
reduce the size of the final core since logic can be reused.
• It is easy to add new pipeline stages that do additional operations. For example, a stage for tesselation,
or a stage for per-pixel lighting.
• With a solid coordination and buffering mechanism, parts of the pipeline can be parallelized for highly
improved performance22 .
• Each module can be developed, simulated and verified individually, making it easier to localize bugs
in the system.
6.2.1
OpenRISC CPU
The reference implementation makes use of an OpenRISC soft processor running at 50 MHz. All of the
FPGA cores are connected to the CPU through a 32-bit wishbone bus interface. The CPU controls the
ORGFX component by setting registers through the bus. For more information about the software running
on the CPU, see section 7.
6.2.2
System-on-Chip
The ORGFX core was verified by being integrated in an ORPSoCv2 system. The main OpenRISC processor
communicates with the accelerator through the wishbone interconnect. The ORPSoCv2 design contains a
memory controller, the display core and the HDMI adapter core. In addition to this the SoC provide many
debug interfaces such as Ethernet, JTAG UART and an USB controller.
22 The
bottleneck in such a system will most likely be the bandwidth of the bus accessing the memory.
26
Figure 19: Picture showing an overview of the ORGFX pipeline. The bold downwards arrows represent the
main flow of data ”downstream”. Acknowledgement signals are sent back ”upstream”. The wishbone reader
and wishbone writer interfaces are connected to video memory through a wishbone connection.
27
6.2.3
Wishbone interfaces
The main control interface of the ORGFX core is a 32-bit wishbone slave. In the reference implementation,
this bus interface is connected to the data bus of the OpenRISC processor, allowing the CPU read and write
access to the devices registers. All accelerated operations are initiated by writing to certain bits in the main
control register on the ORGFX device.
The core has two wishbone master interfaces, one that can initiate reads from memory and one that can
initiate writes to memory. The two were kept separate to keep the internal wishbone logic simple.
Both the wishbone revB.3 and the newer revB.4 specifications define burst read and write modes, to
decrease the overhead of reading and writing larger blocks of information. None of these modes are used in
ORGFX due to limited time for implementation, but the feature is a possible source of optimization.
6.2.4
Pipeline
The ORGFX core uses a pipelined architecture to speed up operation. An overview of the pipeline can be
seen in figure 19. Each module in the pipeline communicates with acknowledge and write signals. A module
will not assert write to the next module unless it receives an acknowledgement first (or if the module was
previously in a ready state, in which case the downstream pipeline is empty). All acknowledgement and
write signals are always exactly one clock tick long, to prevent triggering multiple instances of the same
instruction.
Each module in the pipeline may hold the upstream pipeline for several clock ticks. For example, the
rasterizer will prevent incoming raster instructions until all the pixels for the current operation are generated.
When the rasterizer is ready for new data, it will send an acknowledgement upstream. To keep a consistent
device state, once the pipeline is in operation all wishbone writes to the device are queued up in a FIFO
until the current operation is complete.
Variables that are unique to the current pixel are buffered each step of the pipeline, while variables
constant over one operation – such as the currently active texture – are stored in global registers accessible
by every pipeline stage that needs them.
6.2.5
Transformation processor
As can be seen in the design of the 2D raster features (rectangle, line and triangle), all of the features operate
on points. These points can be transformed to mimic exploring a 3D space, projected on a 2D canvas.
The transformation processor is designed to handle translation, scaling and rotation of the control points
used by the raster operations. It is implemented as a single matrix multiplication which can be loaded to
the device through twelve registers. As can be seen in the pipeline overview, this module is not actually
part of the main pipeline, but it provides input to the rasterizer.
Every point rendered will be affected by this transformation if the transformation processor is currently
active. It is possible to disable it to draw 2D shapes (in this case, the provided points are forwarded instead
of transformed).
6.2.6
Rasterizer
The rasterizer module initiates the rendering of rectangle, line and triangle primitives. When it receives a
command to start an operation, it follows the algorithms described in the design section to generate pixels
one by one. The module will hold the upstream pipeline until the entire shape has been rendered (every
generated pixel has been acknowledged).
The rasterizer has two submodules to handle the more complex rendering processes; one for Bresenham
lines and one for triangles. The module itself has a state machine controlling its behaviour, as can be seen
in figure 20. Starting in the Wait state, the module moves to one of the other states once a signal to start
an operation arrives. The Line and Rect states are very straightforward, and simply generate pixels until
the operation is finished, then return to the Wait state. The triangle rendering is slightly more complex,
going through a preparation state (Triangle Prep) and alternating between the Triangle and Triangle
Write states to generate pixels. This is because unlike the line or rectangle operation, the algorithm has to
examine the generated pixel to see if it is actually inside the triangle, or if it should be discarded.
It should be noted here that the output of the rasterizer can go either to the interpolation pipeline or
directly to the clipping module. Which path the pixel takes depends on if any interpolation operation is
active. This includes:
• Gradient coloring of triangles
• Textured triangles
• Triangles with depth coordinates
• Interpolated alpha
In other words, everything listed in section 5.4.2.
28
Figure 20: Picture of the rasterizer state machine
6.2.7
Interpolation
The division and interpolation modules form a separate pipeline that can be skipped entirely for simple
rendering operations such as rectangles. Interpolated variables are only supported for triangle rendering23 .
As mentioned in the design section, the formula to calculate the Barycentric coordinates of each triangle
corner that will be used for interpolation is as follows:
e0 (x, y)
2A∆
e1 (x, y)
f actor1 =
2A∆
f actor2 = 1 − f actor0 − f actor1
f actor0 =
Both the edge functions and the triangle area are calculated in the triangle rasterizer. The hardware
division is implemented as two pipelined division modules, one for f actor0 and one for f actor1 .
In the interpolation module24 , f actor2 is calculated, and all three factors are used to calculate the depth,
alpha, texture coordinate and color of the point (not all of these values have to be used). The values are
calculated by multiplying the supplied base values of each corner point with the associated factor:
z 0 = f actor0 ∗ z0 + f actor1 ∗ z1 + f actor2 ∗ z2
The calculations for the other values are similar (texture coordinates are two calculations, one for u and
one for v).
6.2.8
Clipping
As mentioned before, the clipping module can take input either directly from the rasterizer, or from the
interpolation pipeline.
Three forms of clipping/culling are performed in the clipping module:
• Clipping against the target size: Any attempted pixel draws that fall outside of the target are
discarded. This operation is always performed.
• Clipping against the clip rect: An arbitrary clipping rectangle can be set. Any pixel falling outside
of it will be discarded. This clipping operation can be turned on and off by setting a flag in the control
register.
• Depth buffer culling: The z-value of the pixel drawn is compared to the z-value at the target pixel.
If the depth value of the pixel is lower (farther away) than the target, the pixel is discarded. This
operation requires that a depth buffer is bound, and that the z-buffer is enabled by setting a flag in
the control register.
When depth buffer culling is activated, the depth buffer has to be accessed. The clipping module does this
by calling the wishbone reader interface through an arbiter (since only one of the three modules connected
to the reader can access it at any given moment, see figure 19). Much like the current render target, the
depth buffer is represented by a base address and a width and height. The depth buffer represents the depth
of each pixel with a 16 bit value25 .
Any time at least one of the conditions for clipping is met, the pixel is discarded and the module
immediately sends an acknowledgement upstream. If none of the enabled clipping conditions are met, the
pixel is passed on to the fragment processor for coloring.
23 It was decided not to support interpolated values for lines (not often useful) or rectangles (can be achieved by drawing two
triangles) because it would mean adding more division units.
24 Also known as the CUVZ module, as it calculates Color, UV-coordinates and Z (depth).
25 In other words, at 16 bit color depth, the render target and the depth buffer will have the same dimensions and take up equal
amount of memory.
29
Figure 21: Picture of the fragment processor state machine
6.2.9
Fragment processor: coloring
The fragment processor adds color to the pixels generated by the rasterizer (the ones that are not discarded
by the clipping module). This can be done using one of several sources:
1. A flat color residing in the main color register.
2. An interpolation of several colors from all the three color registers (one color for each corner of a
triangle).
3. Textured, using texture coordinates U and V generated by either the rasterizer or the interpolation
pipeline.
Which coloring mode is used is defined in a global register, and is constant over the drawing of each
graphics primitive.
Flat colors can be used for all graphics primitives. Since the color is constant over an entire operation,
the fragment processor fetches the color from a global register.
Gradient coloring is only available for triangles. Here, the fragment processor fetches the calculated color
from the interpolation pipeline. This color is the linear combination of three global color registers and the
interpolation factors for each corner of the triangle.
The textured coloring mode is available for rectangles and triangles26 . This mode requires access to
texture memory through the wishbone reader. The address where the fragment processor looks for the color
is calculated from the base texture address and from the U and V texture coordinates. These coordinates
are either generated by the rasterizer (for rectangles) or by the interpolation pipeline (for triangles).
One additional feature handled by the fragment processor is colorkeying. As mentioned in the design
section, colorkeying only really makes sense if textured mode is used. If colorkeying is enabled and the fetched
pixel matches the colorkey, the fragment processor discards the pixel instead of pushing it downstream.
A flowchart for the fragment processor state machine can be seen in figure 21.
6.2.10
Fragment processor: vector rendering
Finally, the fragment processor handles the rendering of filled Bézier shapes, implementing the rendering of
vector graphics described in section 5.5. As stated in the shape implementation section:
Quote from a paper by C.Loop[5]:
Claim: Any rational quadratic parametric curve has an implicit form that is a projected image
of the algebraic curve
f (u, v) = u2 − v
The u and v parameters here should not be confused with the texture coordinates U and V, they are not
related in the ORGFX implementation. Instead, the factors are renamed:
f (bezierF actor0 , bezierF actor1 ) = bezierF actor02 − bezierF actor1
As can be seen in the left triangle in figure 16, the different coordinates at each corner ([0, 0], [ 21 , 0] and
[1, 1]) represent corner values for [bezierF actor0 , bezierF actor1 ].
When the ORGFX device is sent a command to start a Bézier shape operation, it is handled exactly
as an interpolated triangle draw. Each pixel in a rectangular bounding box around the triangle is generated and tested by the rasterizer, and the pixels that fall inside of the triangle are passed on to the
interpolation pipeline. In the interpolation pipeline, the Barycentric coordinates are used to calculate
[bezierF actor0 , bezierF actor1 ] by interpolating between the corner values. The fragment processor is presented with the actual value of [bezierF actor0 , bezierF actor1 ] at the generated pixel, and from this calculates
the result of the equation:
f (bezierF actor0 , bezierF actor1 ) = bezierF actor02 − bezierF actor1
26 Technically lines will also work, but since no texture coordinates are generated, the fragment processor will always fetch the
first pixel of the texture.
30
Figure 22: Picture of the blender state machine
If f (bezierF actor0 , bezierF actor1 ) < 0 then the pixel is inside the curve, otherwise it is outside. The
fragment processor is provided with a flag that decides if the curve should be filled inside or outside (see
figure 17). If the shape should be filled outside, the condition is f (bezierF actor0 , bezierF actor1 ) >= 0
instead.
If the pixel passes the test it is colored as usual, but if the test fails the pixel is discarded. A pixel in
a textured Bézier shape that passes the test can still be discarded in the colorkeying step, if this feature is
enabled.
6.2.11
Blender
The purpose of the Blender is to calculate the combined color, based on the color provided by the fragment
processor, the color at the target pixel and the alpha value. This module implements the transparency
feature described in section 5.3.8.
Alpha blending is an optional feature that can be turned off, which will save some memory bandwidth.
There are two components to the alpha value, the global alpha – fetched from a global register since it
is constant over a primitive – and the pixel alpha. The pixel alpha is only used if triangle interpolation is
active, and enables interpolating between different alpha values over a single primitive. If interpolation is
not active, the fragment processor sets the pixel alpha to no transparency.
All alphas are stored as 8 bit fixed point values (0 integer bits, 8 fractional bits), where 0 represents
full transparency and 255 represents no transparency. The combined alpha is calculated with the following
formula:
alpha = alphaf ragment ∗ alphaglobal
The final alpha is right shifted by 8 bits to account for the fixed point multiplication.
The blender fetches the color of the target pixel from the render target, then calculates the final color of
the pixel:
coloroutr = colorf ragmentr ∗ alpha + colortargetr ∗ (255 − alpha)
coloroutg = colorf ragmentg ∗ alpha + colortargetg ∗ (255 − alpha)
coloroutb = colorf ragmentb ∗ alpha + colortargetb ∗ (255 − alpha)
The final value is right shifted by 8 bits to account for the fixed point multiplication.
A flowchart for the state machine in the blender can be seen in figure 22.
6.2.12
Renderer
The rendering module calculates the address of the target pixel and the bitmask to write the color value to
memory without affecting adjacent pixels. These values are then sent to the wishbone write interface for
processing.
If the depth buffer is enabled and the current pixel passed the clipping stage, the depth of the pixel must
be written to the z-buffer so it can be compared in later operations. In other words: if depth is enabled, the
renderer will perform two memory writes; one to the actual target pixel and one to the depth buffer.
One of the more notable optimizations discussed in future works (section 10) is bandwidth usage optimization. The renderer would be the correct place to implement a write queue to process burst writes.
7
Software integration
In this section the Hardware/Software interface is explained.
31
7.1
Bare metal driver
The term bare metal refers to when the OpenRISC processor is running C-code or assembly instructions
directly without having an operating system active. This mode is very useful for testing and debugging,
since it removes several layers of complexity. All the driver components are written in ANSI C, without any
platform specific functions or macros.
The exact implementation of the driver depends on how the ORGFX device is connected to the OpenRISC processor. The reference implementation developed alongside the component assumes that ORGFX
is mapped to memory and all registers can be written to and read from directly, without any caching.
The bare metal driver is written in several layers of increasing complexity, with the lower layers being
ideal for debugging individual instructions and the higher layers giving the application programmer an API
that is easier to use.
The higher level APIs usually perform more writes to the device than is strictly needed, but they ensure
a more stable device state.
7.1.1
Basic functionality
• orgfx.h
• orgfx.c
• orgfx regs.h
The basic functionality layer handles all communication with the device itself (with each additional layer
only adding convenience functions that use the basic functionality). Communication with the device over
the wishbone bus is performed by a simple macro:
#d e f i n e REG32( add ) ∗ ( ( v o l a t i l e u n s i g n e d i n t
∗ ) ( add ) )
This method can be used to both read and write from memory. The actual memory addresses of each
register and specific pin numbers are stored in orgfx regs.h. These match the hardware parameters defined
in gfx params.v. orgfx regs.h also define the base address of the device on the CPU data bus.
The design of the basic driver functionality is minimalistic, each function call doing as little operations as
possible. To perform more complex tasks, the user of the API will have to call several functions in sequence,
while keeping track of the current device state.
Three things are needed to initialize the driver:
1. A call to orgfx init() to initialize the driver with the base video memory address.
2. A call to orgfx vga set videomode() to initialize the VGA/LCD module.
3. A call to orgfx init surface() to get a rendering target.
The third function returns a struct orgfx surface, which contains information about a render target or
texture. To perform drawing on the new target, it has to be bound as the currently active render target, by
using the orgfx bind rendertarget() function. A render target can be of any resolution or aspect ratio,
but the first one should be set to the same resolution as the video mode (it will represent the screen). The
driver makes no attempt to hold on to render targets, it is entirely up to the user to keep track of them.
Each additional render target is allocated memory sequentially by incrementing a memory offset inside the
driver.
When the device is properly initialized, the user can start making drawing calls to have pixels appear
on the screen. While it is possible to set pixels individually using the orgfx set pixel() function, this
function does not have any hardware acceleration. After setting the drawing color with orgfx set color()
or orgfx set colors(), the user can perform accelerated drawing operations with the following primitives:
orgfx
orgfx
orgfx
orgfx
r e c t ( x0 , y0 , x1 , y1 )
l i n e ( x0 , y0 , x1 , y1 )
t r i a n g l e ( x0 , y0 , x1 , y1 , x2 , y2 , i n t e r p o l a t i o n )
c u r v e ( x0 , y0 , x1 , y1 , x2 , y2 , f i l l , i n s i d e )
These functions draw simple rectangles, Bresenham lines[2], triangles (with or without interpolated
colors), and quadratic Bézier shapes. An important thing to note is that all point coordinates are defined
in fixed point notation. For convenience, the define FIXEDW can be used to create valid coordinates this
way. For example, to draw a rectangle from point (10,15) to (20,25) one would write:
o r g f x r e c t (FIXEDW∗ 1 0 , FIXEDW∗ 1 5 ,
FIXEDW∗ 2 0 , FIXEDW∗ 2 5 ) ;
To do more interesting things than drawing flat rectangles, textures need to be loaded to the device. A
texture is essentially another render target, so orgfx init surface() and orgfx bind rendertarget() has
to be called to allocate the new texture. Once the texture is bound, any of the above drawing operations
can be used to fill the pixels, like with any render target. Usually the user wants to load a prepared image
though, which is most easily achieved by calling orgfx memcpy() with a memory buffer and its size. This
function is intended to accept the generated output of the sprite maker utility (see section 7.2.1).
32
To draw the texture, it has to be bound as a texture using the orgfx bind tex0() function, and texturing
has to be enabled with the orgfx enable tex0() function. Once texturing is enabled, it will be used instead
of the regular color for the drawing primitives.
To only draw certain sections of a texture, the user can set the source rect with the orgfx srcrect()
function. This will add an offset to the texture in orgfx rect() calls. The source rect is reset each time a
new texture is bound.
Drawing textured triangles and Bézier shapes is slightly more complex. For this, a texture coordinate has
to be set for each control point. Do this with a call to orgfx uv(). For this to work, the triangle function
has to be called with the interpolate parameter set to one (texture coordinates will be interpolated between
the triangle control points).
One more thing that should be noted about triangles is that they must be defined in clockwise order.
Any triangles defined in the wrong order will be discarded in hardware, and the same holds true for Bézier
shapes.
Colorkeying can be applied to any texture draws by using orgfx enable colorkey and orgfx set colorkey.
Any time a texture read matches the colorkey, the current pixel is discarded.
The ORGFX alpha blending functionality can be used with the functions orgfx enable alpha and
orgfx set alpha. Take care when using alpha blending together with interpolated triangles, since alpha
values will be set for each control point and interpolated over the primitive. The resulting per-pixel alpha
will be multiplied by the global alpha as described in section 5.3.8. The alpha value sent to the device
consists of four parts, arranged as thus:
Bit # Description
[31:24] Point 0 alpha
[23:16] Point 1 alpha
[15:8]
Point 2 alpha
[7:0]
Global alpha
For example, calling orgfx set alpha with an alpha of 0xff8000ff would mean that p0 is opaque, p1
has half transparency, p2 is transparent and the global alpha is set to opaque.
The ORGFX device has one major function that supports 3D rendering: orgfx triangle3d(). This
function works exactly like the regular triangle function, but allows shapes with depth to be rendered. To
make full use of this feature, the user can create and bind a depth buffer to perform depth culling. The
buffer itself is created the same way as render targets and textures: with orgfx init surface(). There are
three functions related to depth buffer culling:
orgfx bind zbuffer ()
orgfx enable zbuffer ()
orgfx clear zbuffer ()
First, the depth buffer has to be bound. It is up to the user to ensure that the bound z-buffer is of the
same resolution as the render target. Once depth culling is enabled, any writes that pass the culling stage
will overwrite the depth buffer. This means that once the user wants to draw a new frame, the depth buffer
should first be cleared.
Finally, the user can activate the hardware accelerated 3D transformations of the ORGFX device with
orgfx enable transform() and orgfx set transformation matrix().
7.1.2
Extended API
• orgfx plus.h
• orgfx plus.c
While all the functionality of the graphics card can be accessed with the basic driver, it is fairly difficult
to keep track of the device state and keep it consistent. The extended API is intended to improve this and
encapsulates some of the more complex functionality in convenient functions. One major change from the
basic API is that surfaces are tracked internally by the driver, and the user gets an integer ID that is used
for binding the surface.
The extended driver is initialized by a call to orgfxplus init(). The function initializes the graphics
card, sets the video resolution and allocates the screen surface. Additionally, by setting the flags of the
function it automatically allocates surfaces for double buffering and depth buffering. The function returns
an integer number that is used to refer to the screen surface (always -1). When double buffering is activated,
the driver keeps track of which surface is currently the active buffer. The user can switch between active
buffers with the orgfxplus flip() function. If depth buffering is activated, the driver automatically binds
the depth buffer, but does not enable the z-buffer culling.
To initialize a surface and load an image into it with one function call, use orgfxplus init surface().
The function takes the width and height of the surface and a pixel buffer to copy into it, and returns an ID
referring to the allocated surface. The number of surfaces that can be allocated is static and can be changed
before compiling the driver.
33
Figure 23: Example bitmap font. The characters are placed at regular intervals in a 16 by 16 grid.
Since a new syntax for handling surfaces is used, the extended API has two new functions for binding the
render target and the currently active texture: orgfxplus bind rendertarget() orgfxplus bind tex0().
The new syntax also changes the way that sprites are rendered: the orgfxplus draw surface() and
orgfxplus draw surface section() function binds the supplied texture, enables texturing and draws the
image to screen. The second function also sets the source rect, causing only part of the image to be rendered.
All of the functions in the basic API can be used alongside the extended API; the extended API simply
provides an easier way to initialize and handle surfaces.
7.1.3
Advanced API – Tilesets and bitmap fonts
Files: orgfx tileset.h orgfx tileset.c orgfx bitmap font.h orgfx bitmap font.c
A relatively common way to handle sprites is to store multiple sprites in the same image file, and only
draw part of the image when a sprite is requested. It is possible to get this functionality from the basic
driver (by setting the source rect before drawing), or by using the orgfxplus draw surface section()
function from the extended API. Both these methods require that the user provide the source rect every
time a sprite is drawn. The tileset library provides a simple wrapper around this. By storing an array
of orgfx sprite rect structs, the user can draw sprites with a call to the orgfx draw tile() function,
providing a tileset pointer and the index of the sprite to be drawn. The tileset library uses the extended
API syntax for handling surfaces.
Bitmap fonts are a special case of tilesets. By providing an image of the entire ASCII character set, the
user can render text to the screen with only one function call. Figure 23 shows an example bitmap font.
To enable the user to write special characters such as åäö, wide character strings are used. The syntax for
writing text using a loaded bitmap font is as follows (note the L that denotes the text as a wide character
string):
o r g f x p u t b i t m a p t e x t (& f o n t ,
x0 , y0 ,
L”Some example t e x t ” ) ;
Since writing the specification for a bitmap font by hand can be quite tedious, a utility to automate the
process is provided. See section 7.2.2.
7.1.4
Advanced API – Vector fonts
Files: orgfx vector font.h orgfx vector font.c
Vector fonts are much more versatile than bitmap fonts. Since the glyphs are store as vectors, they can
be scaled up or down without loss of detail. In addition to this, the points can be arbitrarily translated,
scaled and rotated.
orgfx make vector font orgfx init vector font orgfx put vector text
For more information on how to actually generate the internal data structures needed to render vector
fonts, see section 7.2.4.
34
Figure 24: The same mesh rendered in wireframe, colored triangles and textured mode.
7.1.5
Advanced API – 3D
Files: orgfx 3d.h orgfx 3d.c
The basic driver allows for hardware accelerated transformations of points and rendering triangles in 3D.
By calling the correct functions a depth buffer can be initialized and used to prevent triangles far away to
overwrite closer triangles. This is quite far from a manageable 3D interface though, so a convenience driver
for displaying 3D models is provided.
The main object of the 3D interface is the orgfx mesh struct. Besides storing all the points in the
model, and information about how they form triangles, the mesh struct contains a set of transformation
variables. The translation, rotation and scale variables can be adjusted to move and manipulate the transformation matrix of the mesh. The mesh can be rendered with the provided transformations by calling
orgfx3d draw mesh(). The function allows for rendering the mesh with filled triangles or as a wireframe,
using lines (see figure 24).
Since the basic driver is only capable of loading a prepared transformation matrix, the 3D API provides
simple functions to create and transform matrices.
Meshes can be generated from Wavefront .obj files with the meshmaker utility (see section 7.2.3).
7.2
Utilities
While developing the graphics accelerator we implemented some tools to make it easier to manage the
project.
7.2.1
Sprite maker utility
A small application that converts an image into a header file that can be included in the project when
compiled. The application generates an array of color values that can be loaded as a sprite.
The application has support for reading common image file formats such as bmp, png and jpg (for a full
list, see the supported file formats of the SDL image libaray). 8- 16- and 32-bit output is supported, and
can be changed by passing a command line argument to the program (by default, the output is adjusted for
16 bit color mode).
The resulting output header file, which is named after the input, can be included in a program using
the extended bare metal driver. The easiest way to use the sprite is to use the generated initialize function
defined in the header file.
7.2.2
Bitmap font maker utility
Another application generates the data structures necessary to load bitmap fonts with very little effort. It
takes an image and a grid spacing as input, and automatically generates offsets for all the glyphs in the font.
The font generated by the program has 256 characters arranged according to the ASCII charset, as seen in
figure 25 and 26.
The application has support for reading common image file formats such as bmp, png and jpg (for a full
list, see the supported file formats of the SDL image libaray). 8- 16- and 32-bit output is supported, and
can be changed by passing a command line argument to the program (by default, the output is adjusted for
16 bit color mode). Both vertical and horizontal grid spacing are set to 32 pixels by default, but this can be
changed through command line arguments.
The resulting output header file, which is named after the input, can be included in a program using the
bare metal and font driver. The easiest way to use the bitmap font is to use the generated initialize function
defined in the header file.
35
Figure 25: The ASCII table. Each number from 0 to 127 refers to a character. The numbers 0 to 31 cannot be
printed.
36
Figure 26: The extended ASCII table. Each number from 128 to 255 refers to a character, mostly special
characters not included in the basic table.
37
Figure 27: A font rendered by the software implementation of the ORGFX. Bézier curves are single colored
while the triangles are interpolated between current color and black
7.2.3
Mesh maker utility
The mesh maker utility loads 3D objects and generates a header file that can be used by the advanced 3D
API. Currently the utility only supports Wavefront .obj files which only contains 3rd order polygons. Any
higher order polygons will be discarded, so all polygons in the model must be converted to triangles prior
to running the utility.
The application supports loading texture coordinates for each vertex, allowing for textured meshes.
The resulting output header file, which is named after the input, can be included in a program using the
bare metal 3D API. The easiest way to use the mesh is to use the generated initialize function defined in
the header file.
7.2.4
Vector font maker utility
The Font maker is a application that can convert a .ttf file to a format that the graphics card can handle.
The Font maker outputs a .h file that can be included in a project to enable the graphics accelerators vector
font capabilities.
A TTF font is a font format that stores a set of explicit points to describe an outline. The points connects
to each other and form shapes. The converter finds all explicit vector points in a .ttf file and then calculates
the implicit points. At the same time it checks where the glyphs contours end.
The points are then sent to a Delaunay triangulation function – based on the work of V. Domiter and
B. Zalik [4] and implemented by M. Green and T. Åhlén 27 . The generated .h file consists of two fields for
each glyph, one field for Bézier writes and one for triangle writes. The generated header file will contain two
lists for each glyph, one to store Bézier writes and one to store triangle writes. The algorithm is confirmed
to work with a development font (see figure 27).
The following assumptions are made:
• The initial shape in the glyph is a filled shape.
• Any shape that is defined outside of the previously filled shape is also a filled shape.
• All shapes that collide with the previous filled shape are holes in that shape.
This algorithm does not work with fonts that begin with a hole and then later add the filled shape.
8
Testing and validation
This section will describe the testing and validation processes used in this project. Since the ORGFX core is
a very complex system spanning both software and hardware, it is important that each subsystem is properly
validated, both separately and in their interaction.
8.1
Algorithmic validation
All of the rasterization and rendering logic was implemented as C-code to validate its function prior to
Verilog implementation. Using Simple Direct media Layer (SDL28 ) as a graphical backend with a ”put
pixel” interface, the speed at which prototypes could be developed was greatly increased.
27 http://code.google.com/p/poly2tri/
28 http://www.libsdl.org
38
This validation step was performed to confirm that the chosen algorithms worked the way they were
supposed to, and to identify possible problems with them. In addition, since the software implementation
was designed to use the same API as the hardware implementation, applications using the accelerator can
be developed and tested faster.
8.2
Hardware validation
Once the algorithm itself was verified, it was implemented in hardware. This hardware had to – in turn –
be verified, due to the increased complexity of parallel computations and issues introduced by timing.
Icarus Verilog (iverilog) is a open source Verilog simulation tool that also can be used as a synthesis tool.
This tool is used to build test benches, a test bench is a small script containing simulated input. The test
bench is then compiled with the corresponding HDL code, this generates a dump file that can be viewed
in a wave viewer. In this project we have used the open source tool GTKWave. The output from the test
bench is analysed in GTKWave to see if we get the correct output for the given input.
Each module have its own test bench based on iverilog. There is also a test bench simulating the pipeline
as a whole system. This verifies that the modules are properly connected and interacts correctly. The test
benches verify that the implementation is logically correct. They do not detect any timing errors that can
occur when the code gets synthesized/mapped onto the device. It can be hard to verify some graphical
operations with a test bench and while not being the perfect debug environment, it is a lot better then just
doing a visual inspection of what shows up on the screen.
8.3
Software validation
The software of the ORGFX component needs to be verified both separately and together with the hardware.
Thankfully the interface between the two is very simple, and just consists of fixed width memory writes.
The bare metal driver is verified by a script that runs a test application and checks that the output is
correct for the given input. The software is also tested and verified by visual inspection of the output on
the synthesized hardware. The script verifies that the API works as intended.
8.4
System validation
The system test is based on iverilog and the bare metal drivers. A script binds the system testbench and
bare metal drivers together and checks that the correct output is delivered according to the input. This test
proves that the software and hardware are compatible and give the correct output. However, running the
Verilog code through iverilog will not guarantee that the hardware actually works on the device.
Additional considerations such as fitting, the availability of specialized hardware on the FPGA and
routing delays can affect the performance and function of the ORGFX.
9
Results
The ORSoC Graphics Accelerator is a FPGA core with 2D, 3D and vector drawing capabilities. The use of
a graphics accelerator releases CPU time that can be put to better use than putting pixels on the screen.
The current implementation is very generic and platform independent but still manages to run a demo of all
its features smoothly on a 50 MHz OpenRisc processor. If some more time is spent on optimization for the
specific platform the ORGFX will work even better. The best way to improve performance is to implement
hardware with a dedicated graphics RAM.
9.1
Performance
This project have aimed to build a generic graphics accelerator and the focus has been on implementing
new features rather than optimizing the implementation for the current development platform. The limiting
factor on the development board is how the accelerator accesses the RAM. There is no dedicated memory
for the graphics accelerator and there is no texture cache implemented on the graphics accelerator.
Ultimately, tests show that the main bottleneck was the bandwidth of the wishbone bus and memory
access. The memory bandwidth has to be shared with the VGA core, and if the ORGFX uses too much
memory bandwidth the VGA core is unable to handle it, making the output picture unstable.
Performance of applications depend on how rendering is handled in software. One common technique
used is to clear and redraw the entire scene each frame. This will expend a lot of bandwidth and a smooth
framerate (above 25 frames per second) will not be possible. Getting a smooth framerate is not as problematic
if only parts of the screen are redrawn (the parts that change). Scrolling scenes can be implemented by
moving the VGA read pointer instead of redrawing the entire screen.
It is more difficult to achieve smooth framerate for 3D rendering than for 2D, since moving the camera
usually forces a complete redraw of the scene.
39
9.2
Benchmarking
The ORGFX core take up 10000 slice LUTs (calculated using Xilinx ISE 13.4).
The longest timing path of the core on the Atlys board is 16.076 ns, allowing for a core speed of 62.205
MHz. The current implementation is able to display a smooth rendering of a rotating 3D mesh with 90
faces.
The ORGFX can display roughly 5.1 million pixels per second (simple pixel-by-pixel rectangle rendering).
This is compared to roughly 0.5 million pixels per second rendered by the 50 MHz CPU (also simple pixelby-pixel rectangle rendering). This is a 10 times increase in performance. It should also be noted that the
CPU is free to perform other operations during the hardware rendering.
More complex operations should yield an even greater improvement in performance, since the ORGFX
pipeline has specialized hardware for transformations, coloring and texturing.
10
Future work
The design and implementation of the ORSoC graphics accelerator presented in this thesis is just a proof of
concept, and many things could be worked on to improve both the function and performance of the device.
This section lists a number of areas that should have future work dedicated to them.
10.1
Textures
To make it possible to interpolate from one image to another, more texture banks needs to be added.
Currently only one (Tex0) is implemented. If several image sources are available, this also opens up the
possibility to add new interesting features to the device such as bump mapping, normal mapping or decals.
Of course, more textures on the same surface means more memory reads per pixel, which leads to the next
point of improvement.
10.2
Bandwidth issues
The current implementation suffers from bandwidth limitations and unoptimized use of bandwidth. The
same pixel in a texture may be read multiple times, causing a large overhead in the communication with
the video memory. There are two relatively simple ways to improve performance here: by implementing an
internal texture cache for each of the textures or buffers, several clock cycles per pixel operation could be
gained.
Another way to reduce the problems introduced by the limited bandwidth is to optimize the wishbone
access by using block reads and writes, described in the revB.4 Wishbone bus specification.
10.3
8/24/32 bpp
Another desired feature is to have proper support for 8-, 24- and 32-bit color depth modes. The current
implementation only has full support for 16-bit color depth mode. This feature is closely entangled with
how the display controller is implemented, since the ORGFX device has to write pixels to memory in the
same format that the display controller reads.
10.4
Alpha from memory
The current implementation supports setting the transparency of a drawing primitive either globally or
through interpolation. Colorkeying does implement a form of per-pixel transparency loaded from memory,
but it would be desirable to have full alpha support for each pixel. This would of course further increase
bandwidth usage.
10.5
Precision issues
The choice to use fixed point arithmetic in ORGFX was based on fast development time and low logic
complexity (which in turn translates to less logic usage on the FPGA). It does introduce two problems
however:
1. The device has trouble processing extremely large or extremely small numbers.
2. There is an inevitable loss of precision due to the calculations. In some cases this may be visible to
the user in the form of jagged textures or triangle edges not matching perfectly.
While the issues could be reduced by increasing the bits used for the fixed point arithmetic, that would
in turn lead to greater bandwidth usage. The most desirable solution would be a full floating point unit
(FPU) to process the calculations, but that could be extremely costly in terms of FPGA logic usage and
adds an entire level of complexity.
40
10.6
Platform specific optimizations
The current implementation suffers from performance issues, some of which could possibly be overcome by
adding optimizations specific to a particular development board or FPGA circuit. While this may gain some
speed or reduce the size of the IP Core, it would reduce the number of platforms that the device can be
implemented on. The ORGFX implementation was specifically designed to be as generic as possible so it
can be loaded to any FPGA device.
It is even possible to change the display controller and the master CPU without any changes to the
ORGFX component. There is only one non-generic part of the current design: the wishbone bus interface.
10.7
Other bus implementations
The ORGFX graphics accelerator would benefit from support of common FPGA data buses like Altera’s
Avalon bus used for the NiosII soft core processor or CoreConnect PLB bus that is used with Xilinx soft
core processor Microblaze.
Expanding the number of available communication interfaces has two advantages:
1. It makes it possible to integrate the component in older SoC designs with minimal effort.
2. It makes it possible to use the SoC design tools provided by the larger FPGA vendors (Altera has
SOPC Builder/QSys for example). This can greatly increase the speed of designing larger systems.
10.8
Linux driver
The possibility of implementing a Linux driver was studied during the research phase of this thesis. It was
concluded that it would be most convenient to implement a DirectFB driver or use the bare metal drivers
and write to the hardware through memory mapping. This is an easy way to add Linux support for the
graphics card, but it requires that programs have their graphics API ported to the DirectFB/ORGFX API
to gain graphics acceleration.
Due to the complexity of the task and the limited time a Linux driver where never implemented. A
DirectFB and/or DRI/DRM driver might be included in future releases.
11
Conclusions
The ORSoC graphics accelerator is a fully functional 2D and 3D graphics accelerator for embedded systems,
with additional support for hardware accelerated vector graphics. While the device uses technology a few
years behind current high end graphics accelerators, it is one of the few truly open alternatives, since all
hardware, software and documentation is available under LGPL.
The aim to make the implementation as generic and platform independent as possible have led to some
concessions on performance, but the modular design allows for a lot of expansions. A future implementation
of ORGFX optimized against a target platform and configured with multiple pipelines and texture cache
would lead to large improvements in performance.
Code written for the ORGFX API can with the help of the software implementation be verified without
access to the graphics hardware. This allows interested peers to become developers for the ORGFX without
having to buy expensive hardware. By using the provided utilities the developers can quickly integrate
media into their ORGFX applications.
The ORGFX as it is can be used for static or low framerate graphics applications on embedded systems,
such as HMI interfaces. The authors of this thesis hope that ORGFX can be used as a base platform to
build additional functionality for open hardware graphics, and that future performance optimizations can
make the platform viable for high framerate graphics on embedded platforms.
References
[1] S. Bailey. Comparison of vhdl, verilog and systemverilog. 2003.
[2] J. E. Bresenham. Algorithm for computer control of a digital plotter. IBM Systems Journal, 4(1):25
–30, 1965.
[3] S.-H. Chen, H.-M. Lin, C.-C. Hsieh, C.-T. Huang, J.-J. Liou, and Y.-C. Chung. Turbovg: a hw/sw codesigned multi-core openvg accelerator for vector graphics applications with embedded power profiler.
In Proceedings of the 16th Asia and South Pacific Design Automation Conference, ASPDAC ’11, pages
97–98, Piscataway, NJ, USA, 2011. IEEE Press.
[4] V. Domiter and B. Zalik. Sweep-line algorithm for constrained delaunay triangulation. International
Journal of Geographical Information Science, 22(4):449–462, 2008.
[5] C. Loop and J. Blinn. Resolution independent curve rendering using programmable graphics hardware.
ACM Trans. Graph., 24:1000–1009, July 2005.
[6] K. Mcallister. Triangle rasterization, 2007.
41
[7] H. Nguyen. Gpu gems 3. Addison-Wesley Professional, first edition, 2007.
[8] J. G. Rokne, B. Wyvill, and X. Wu. Fast line scan-conversion. ACM Trans. Graph., 9(4):376–388, Oct.
1990.
[9] A. R. Smith. Alpha and the history of digital compositing. In Microsoft Technical Memo 7, 1995.
[10] W. Zhang and I. Majdandzic. Fast triangle rasterization using irregular z-buffer on cuda. 2010.
42
A
Appendix A, ORGFX Specification
43
ORSoC Graphics accelerator Specification
Per Lenander, Anton Fosselius
August 20, 2012
1
Revision history
Rev.
1.0
2.0
3.0
Date
23/3/2012
4/6/2012
20/8/2012
Author
Per Lenander
Per Lenander
Anton Fosselius
Description
Initial draft and basic functionality
Advanced functionality (vector, 3D etc)
Fixed typos
2
Contents
1 Introduction
1.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 IP Core directory structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Architecture
2.1 Overview . . . . . . . . . . . . .
2.2 Concepts . . . . . . . . . . . . .
2.3 Coordinate precision . . . . . . .
2.4 Instruction FIFO . . . . . . . . .
2.5 Pipeline . . . . . . . . . . . . . .
2.6 Description of core modules . . .
2.6.1 Wishbone slave . . . . . .
2.6.2 Transformation processor
2.6.3 Rasterizer . . . . . . . . .
2.6.4 Clipper . . . . . . . . . .
2.6.5 Fragment processor . . . .
2.6.6 Blender . . . . . . . . . .
2.6.7 Wishbone arbiter . . . . .
2.6.8 Wishbone master read . .
2.6.9 Renderer . . . . . . . . .
2.6.10 Wishbone master write .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 IO Ports
6
6
6
6
6
8
8
9
9
10
10
10
10
10
10
10
10
10
10
10
11
4 Registers
4.1 Control Register (CONTROL) . . . . . . . . .
4.2 Status Register (STATUS) . . . . . . . . . . . .
4.3 Alpha (ALPHA) . . . . . . . . . . . . . . . . .
4.4 Colorkey register (COLORKEY) . . . . . . . .
4.5 Target base address Register (TARGET BASE)
4.6 Target size width Register (TARGET SIZE X)
4.7 Target size y Register (TARGET SIZE Y) . . .
4.8 Texture 0 Base Register (TEX0 BASE) . . . .
4.9 Texture 0 size x Register (TEX0 SIZE X) . . .
4.10 Texture 0 size y Register (TEX0 SIZE Y) . . .
4.11 Source Pixel position 0 x Register (SRC P0 X)
4.12 Source Pixel position 0 y Register (SRC P0 Y)
4.13 Source Pixel position 1 Register (SRC P1 X) .
4.14 Source Pixel position 1 Register (SRC P1 Y) .
4.15 Destination Pixel position Register (DEST X) .
4.16 Destination Pixel position Register (DEST Y) .
4.17 Destination Pixel position Register (DEST Z) .
4.18 Matrix coefficient registers . . . . . . . . . . . .
4.19 Clip Pixel position 0 x Register (CLIP P0 X) .
4.20 Clip Pixel position 0 y Register (CLIP P0 Y) .
4.21 Clip Pixel position 1 x Register (CLIP P1 X) .
4.22 Clip Pixel position 1 y Register (CLIP P1 Y) .
4.23 Color Registers (COLOR0-2) . . . . . . . . . .
4.24 Texture coordinate Registers (U0-2 and V0-2) .
4.25 Depth buffer Register (ZBUFFER BASE) . . .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12
13
13
13
14
14
14
14
14
14
14
15
15
15
15
15
15
15
15
16
16
16
16
16
17
17
5 Operation
5.1 Draw pixel
5.2 Fill rect . .
5.3 Line . . . .
5.4 Triangle . .
5.5 Curve . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6 Clocks
7 Driver interface
7.1 newlib . . . . . . . . . . . . . . . . . .
7.1.1 orgfx init . . . . . . . . . . . .
7.1.2 orgfx vga set videomode . . . .
7.1.3 orgfx vga set vbara . . . . . . .
7.1.4 orgfx vga set vbarb . . . . . .
7.1.5 orgfx vga bank switch . . . . .
7.1.6 orgfx init surface . . . . . . . .
7.1.7 orgfx bind rendertarget . . . .
7.1.8 orgfx enable cliprect . . . . . .
7.1.9 orgfx cliprect . . . . . . . . . .
7.1.10 orgfx srcrect . . . . . . . . . .
7.1.11 orgfx set pixel . . . . . . . . .
7.1.12 orgfx memcpy . . . . . . . . . .
7.1.13 orgfx set color . . . . . . . . .
7.1.14 orgfx set colors . . . . . . . . .
7.1.15 orgfx rect . . . . . . . . . . . .
7.1.16 orgfx line . . . . . . . . . . . .
7.1.17 orgfx line3d . . . . . . . . . . .
7.1.18 orgfx triangle . . . . . . . . . .
7.1.19 orgfx triangle3d . . . . . . . . .
7.1.20 orgfx curve . . . . . . . . . . .
7.1.21 orgfx uv . . . . . . . . . . . . .
7.1.22 orgfx enable tex0 . . . . . . . .
7.1.23 orgfx bind tex0 . . . . . . . . .
7.1.24 orgfx enable zbuffer . . . . . .
7.1.25 orgfx bind zbuffer . . . . . . .
7.1.26 orgfx clear zbuffer . . . . . . .
7.1.27 orgfx enable alpha . . . . . . .
7.1.28 orgfx set alpha . . . . . . . . .
7.1.29 orgfx enable colorkey . . . . . .
7.1.30 orgfx set colorkey . . . . . . . .
7.1.31 orgfx enable transform . . . . .
7.1.32 orgfx set transformation matrix
7.2 Extended newlib . . . . . . . . . . . .
7.2.1 orgfxplus init . . . . . . . . . .
7.2.2 orgfxplus init surface . . . . . .
7.2.3 orgfxplus bind rendertarget . .
7.2.4 orgfxplus bind tex0 . . . . . . .
7.2.5 orgfxplus flip . . . . . . . . . .
7.2.6 orgfxplus clip . . . . . . . . . .
7.2.7 orgfxplus fill . . . . . . . . . .
7.2.8 orgfxplus line . . . . . . . . . .
7.2.9 orgfxplus triangle . . . . . . . .
7.2.10 orgfxplus curve . . . . . . . . .
7.2.11 orgfxplus draw surface . . . . .
7.2.12 orgfxplus draw surface section
17
17
17
17
17
18
18
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18
18
18
18
18
18
19
19
19
19
19
19
19
20
20
20
20
20
20
20
21
21
21
21
21
21
21
22
22
22
22
22
22
22
22
23
23
23
23
23
23
24
24
24
24
24
24
7.3
7.4
7.5
7.6
7.7
7.8
7.2.13 orgfxplus colorkey . . . .
7.2.14 orgfxplus alpha . . . . . .
Bitmap Fonts . . . . . . . . . . .
7.3.1 orgfx make bitmap font .
7.3.2 orgfx put text . . . . . . .
Vector Fonts . . . . . . . . . . .
7.4.1 orgfx make vector font . .
7.4.2 orgfx init vector font . . .
7.4.3 orgfx put vector char . .
7.4.4 orgfx put vector text . . .
3D API . . . . . . . . . . . . . .
7.5.1 Transformations . . . . .
7.5.2 orgfx3d make mesh . . . .
7.5.3 orgfx3d mesh texture size
7.5.4 orgfx3d draw mesh . . . .
Linux . . . . . . . . . . . . . . .
Software emulation . . . . . . . .
Utilities . . . . . . . . . . . . . .
7.8.1 Sprite maker utility . . .
7.8.2 Bitmap font maker utility
7.8.3 Mesh maker utility . . . .
7.8.4 Vector font maker utility
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8 Programming examples
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
25
25
25
25
25
25
26
26
26
26
26
26
27
27
27
27
27
27
27
30
30
30
5
1
Introduction
The ORSoC Graphics accelerator allows the user to do advanced vector rendering and 2D blitting
to a memory area. The core supports operations such as drawing textures, lines, curves and filling
rectangular and triangular areas with color.
This IP Core is designed to integrate with the OpenRISC processor through a Wishbone bus
interface. The core itself has no means of displaying the information rendered, for this purpose
it can work alongside a display component, such as the enhanced VGA/LCD IP core found on
OpenCores.
1.1
Features
• 32-bit Wishbone bus interface
• Integrates with enhanced VGA/LCD IP core
• Support for 16 bit color depth
• Support for variable resolution
• Acceleration of line operations
• Acceleration of rectangle and triangle rasterization
• Acceleration of memory copy operations
• Textures can be saved to video memory
• Vector transformation and rasterization
• Clipping/Scissoring
• Alpha blending and colorkeying
• Filled Bezier curves
• Bitmap Fonts
• Vector Fonts (ttf)
• Interpolation of colors
• UV-Mapping
• Transformation (scaling and rotation)
• 3D model support (.obj model files built using 3rd degree polygons)
• Z-Buffer (triangles drawn in depth order)
• Requires around 10000 Slice LUTs (Xilinx ISE 13.4)
1.2
IP Core directory structure
An overview of the contents of the IP core source folder can be found in figure 1.
2
2.1
Architecture
Overview
A topology of how the ORGFX is connected to the VGA driver and the OpenRISC core is shown
in figure 2. The ORGFX has three wishbone interfaces: one read/write port that is used to communicate with the host CPU. One read port that reads depth/texture/alpha blending information
from the RAM and one write port to write pixel information to the RAM.
6
Figure 1: Directory structure of the ORSoC graphics accelerator.
Figure 2: Overview of the ORPSoCv2’s wishbone interconnection.
7
Figure 3: 1. Texture, 2. Source, 3. Render target, 4. Clip, 5. Destination
2.2
Concepts
This section describes a few basic terms used in this document.
Video memory – The ORGFX component writes pixels one by one to an external memory,
usually an SDRAM or DDR RAM chip. The CPU should also have access to this memory space
to be able to write pixels directly (the easiest way to load textures).
Render target – The render target, defined by the target base and size registers, describes
the area to which all operations render pixels. It is possible to change the base address and size,
enabling render-to-texture and double buffering.
Surface/Texture – Any memory area that can be rendered to, including the render target, is
considered a surface. A surface is defined by its base address and size. There are two main surfaces
that the ORGFX device handles: the render target and the currently active texture. Swapping
between different textures has to be done in software. The operation of setting the current render
target or texture is referred to as binding.
Source, Destination and Clip rectangles – There are three sets of rectangles that affect
rendering, each described by two points. The first point sets the beginning of the rectangle, while
the second point sets the pixel after the end of the rectangle. This way, a rectangle exactly filling
the screen would be (0,0,640,480) at 640x480 resolution. See figure 3
Source rectangle – The source rectangle defines what pixels should be read from a texture
during textured operations. The points are defined in the coordinates of the currently bound
texture. This way sections of a texture can be drawn (good for tile maps or bitmap fonts).
Destination rectangle – The destination rectangle defines where operations such as draw
pixel and draw line will draw pixels, in the coordinates of the render target.
Clip rectangle – The clip rectangle defines an area within the current render target which is
valid to draw to. Any pixels outside this rectangle are discarded in the rasterization step. Pixels
outside of the render target are automatically discarded.
Z-buffer – The depth or Z-buffer is a surface containing z coordinate information. This can
be used to draw graphics primitives in depth-correct order.
2.3
Coordinate precision
The ORGFX core supports variable coordinate precision through two parameters, point width
and subpixel width. Both parameters defaults to 16 bits width.
Target size, clip and source rects are defined as point width bit integers. Destination points
are defined as fixed point numbers, with point width bit integer precision and subpixel width
fractional precision. Internally many calculations are done with fixed point logic.
8
Figure 4: Picture of the ORGFX pipeline
2.4
Instruction FIFO
All wishbone writes sent to the slave interface will pass through an instruction fifo. If the device
is in the busy state (when the pipeline is active) the instruction will be queued instead of being
set immediately. This is important to take into account when reading from registers, since an
operation that changes the register being read might be queued. To find out if the device is busy,
poll the status register and check if the busy bit is high.
2.5
Pipeline
The ORGFX core uses a pipelined architecture to speed up operation. An overview of the pipeline
can be seen in figure 4. Each module in the pipeline communicates with acknowledge and write
signals. A module will not assert write to the next module unless it receives an ack first (or if the
module was previously in a ready state, in which case the downstream pipeline is empty). All ack
and write signals are always exactly one clock tick long, to prevent triggering multiple instances
of the same instruction.
Each module in the pipeline may hold the upstream pipeline for several clock ticks. For example,
the rasterizer will prevent incoming raster instructions until all the pixels for the current operation
are generated. When the rasterizer is ready for new data, it will send an ack upstream.
9
2.6
Description of core modules
2.6.1
Wishbone slave
The wishbone slave handles all communication from the main OpenRISC processor (or other
master CPU). This component holds all the registers, and the instruction FIFO that sets them.
This component can be in one of two states: busy or wait. It enters the busy state when a pipeline
operation is initialized, and returns to the wait state when the operation is finished. Operations
can be initialized by writing to the control register (see section 4).
2.6.2
Transformation processor
The transformation processor handles rotations and scaling.
2.6.3
Rasterizer
The rasterizer generates pixel coordinates from points for several different operations.
2.6.4
Clipper
Discard generated pixel if clipping is enabled and pixel is out of bounds. Always discard pixels
outside of the target area.
2.6.5
Fragment processor
The fragment processor adds color to the pixel generated by the rasterizer. If texturing is disabled
a color supplied from the color register is used. If texturing is enabled on the other hand, the
u v coordinates supplied by the rasterizer are used to fetch a pixel from the active texture. If
colorkeying is enabled and the fetched color matches the color key, the current pixel is discarded.
2.6.6
Blender
The blender module performs alpha blending if this is enabled. The module fetches the color of
the pixel that the current operation will write to, and mixes the value of the target color and the
color from the fragment processor using the following algorithm:
alpha = alphaglobal ∗ alphapixel
colorout = colorin ∗ alpha + colortarget ∗ (1 − alpha)
where alpha is a value between 0 (transparent) and 1 (opaque). If alpha blending is disabled
the pixel is passed on unmodified. The alpha value can be interpolated over a triangle to create
gradients. If this function is turned off (interpolation is disabled on triangle draws) then alphapixel
is set to 1.
2.6.7
Wishbone arbiter
Since two parts of the pipeline (fragment and blender) needs to access video memory, the arbiter
makes certain only one of them can access the reader at once. The blender has the highest priority.
2.6.8
Wishbone master read
The wishbone reader handles all reads from video memory.
2.6.9
Renderer
The renderer calculates the memory address of the target pixel.
2.6.10
Wishbone master write
The wishbone master handles all writes to the video memory.
10
3
IO Ports
The Core has three wishbone interfaces:
• Wishbone slave – connects to the data bus of the OpenRISC processor. In the case of ORPSoC, this bus is connected through an arbiter. Supports standard wishbone communications,
not any burst modes.
• Wishbone master read-only – connects to a video memory port with read access. Used for
fetching textures and during blending.
• Wishbone master write-only – connects to a video memory port with write access. Used for
rendering pixels to the framebuffer.
There is an interrupt enabled that can be connected to the interrupt pins on the or1200 CPU
(in the supplied orpsoc top it is connected to or1200 pic ints[9]). For this interrupt to trigger, the
correct bits in the control register has to be set.
11
4
Registers
Name
CONTROL
STATUS
ALPHA
COLORKEY
TARGET BASE
TARGET SIZE X
TARGET SIZE Y
TEX0 BASE
TEX0 SIZE X
TEX0 SIZE Y
SRC P0 X
SRC P0 Y
SRC P1 X
SRC P1 Y
DEST X
DEST Y
DEST Z
AA
AB
AC
TX
BA
BB
BC
TY
CA
CB
CC
TZ
CLIP P0 X
CLIP P0 Y
CLIP P1 X
CLIP P1 Y
COLOR0
COLOR1
COLOR2
U0
V0
U1
V1
U2
V2
ZBUFFER BASE
Addr
0x00
0x04
0x08
0x0c
0x10
0x14
0x18
0x1c
0x20
0x24
0x28
0x2c
0x30
0x34
0x38
0x3c
0x40
0x44
0x48
0x4c
0x50
0x54
0x58
0x5c
0x60
0x64
0x68
0x6c
0x70
0x74
0x78
0x7c
0x80
0x84
0x88
0x8c
0x90
0x94
0x98
0x9c
0xa0
0xa4
0xa8
Width
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
Access
RW
R
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
RW
Description
Control register
Status register
Global alpha register
Colorkey register
Render target base
Render target width
Render target height
Texture 0 base
Texture 0 width
Texture 0 height
Source pixel 0 x
Source pixel 0 y
Source pixel 1 x
Source pixel 1 y
Destination pixel x
Destination pixel y
Destination pixel z
Transformation matrix coefficient
Clip pixel 0 x
Clip pixel 0 y
Clip pixel 1 x
Clip pixel 0 y
Color 0
Color 1
Color 2
Texture coordinate 0
Depth buffer base address
Each register is described in detail in the following sections, with information about what the
purpose of each bit in the register is. The default value provided for each register is set when the
device receives a reset signal.
12
4.1
Control Register (CONTROL)
Bit #
[31:20]
[19]
[18]
[17:16]
[15:14]
[13]
[12]
[11]
[10]
[9]
[8]
[7]
[6]
[5]
[4]
[3]
[2]
[1:0]
Access
W
W
RW
W
W
W
W
W
W
RW
RW
RW
RW
RW
RW
Description
Reserved
Transform point
Forward point
Active point
Reserved
Bézier inside shape
Interpolation
Curve write
Triangle write
Line write
Rect write
Reserved
Z-buffer enable
Clipping enable
Colorkey enable
Blending enable
Texture0 enable
Color depth
Default value: 0x00
Color depth is defined as follows:
Mode Color depth
00
8 bit
01
16 bit
10
24 bit (not supported)
11
32 bit
The active point is defined as follows:
Mode Point id
00
p0
01
p1
10
p2
11
The operations Forward point and Transform point reads the current values of the active
point and stores the x, y, z values in the correct register inside the device.
4.2
Status Register (STATUS)
Bit #
[31:16]
[15:1]
[0]
Access
R
R
R
Description
Current FIFO size
Reserved
Busy pin (high when busy)
Default value: –
4.3
Alpha (ALPHA)
Bit #
[31:24]
[23:16]
[15:8]
[7:0]
Access
RW
RW
RW
RW
Description
Point 0 alpha
Point 1 alpha
Point 2 alpha
Global alpha
Default value: 0xffffffff
13
The global alpha value is used in all rendering when alpha blending is enabled. 0xff is full
opacity, while 0x00 is full transparency (nothing rendered). When interpolation of triangles is
activated, the point alpha values are used to find an interpolated alpha value for each pixel. This
value is then multiplied with the global alpha before being used for blending.
4.4
Colorkey register (COLORKEY)
Bit #
[31:0]
Access
RW
Description
Colorkey
Default value: 0x00
By setting a colorkey certain pixels in a texture can be discarded in the fragment stage, providing a hard transparency. Depending on the color depth, a mask is applied to the color. Using 8 bit
color, only the 8 least significant bits in the colorkey will be compared with the texture color during
the check. The colorkey enable bit in the control register must be set to enable this functionality.
4.5
Target base address Register (TARGET BASE)
Bit #
[31:2]
[1:0]
Access
RW
-
Description
Video Memory Address
Nothing
Default value: 0x00
4.6
Target size width Register (TARGET SIZE X)
Bit #
[31:0]
Access
RW
Description
Integer Width
Default value: 0x00
4.7
Target size y Register (TARGET SIZE Y)
Bit #
[31:0]
Access
RW
Description
Integer Height
Default value: 0x00
4.8
Texture 0 Base Register (TEX0 BASE)
Bit #
[31:2]
[1:0]
Access
RW
-
Description
Video Memory Address
Nothing
Default value: 0x00
4.9
Texture 0 size x Register (TEX0 SIZE X)
Bit #
[31:0]
Access
RW
Description
Integer Width
Default value: 0x00
4.10
Bit #
[31:0]
Texture 0 size y Register (TEX0 SIZE Y)
Access
RW
Description
Integer Height
Default value: 0x00
14
4.11
Bit #
[31:0]
Source Pixel position 0 x Register (SRC P0 X)
Access
RW
Description
Integer x pos
Default value: 0x00
The source pixels are used to define a specific area in a texture to draw.
4.12
Bit #
[31:0]
Source Pixel position 0 y Register (SRC P0 Y)
Access
RW
Description
Integer y pos
Default value: 0x00
4.13
Bit #
[31:0]
Source Pixel position 1 Register (SRC P1 X)
Access
RW
Description
Integer x pos
Default value: 0x00
4.14
Bit #
[31:0]
Source Pixel position 1 Register (SRC P1 Y)
Access
RW
Description
Integer y pos
Default value: 0x00
4.15
Bit #
[31:16]
[15:0]
Destination Pixel position Register (DEST X)
Access
RW
RW
Description
Signed Integer part
Fractional part
Default value: 0x00
The control register flag active point decides the destination register inside the device. Points
are pushed to the device by setting the forward or transform bit in the control register.
4.16
Bit #
[31:16]
[15:0]
Destination Pixel position Register (DEST Y)
Access
RW
RW
Description
Signed Integer part
Fractional part
Default value: 0x00
4.17
Bit #
[31:16]
[15:0]
Destination Pixel position Register (DEST Z)
Access
RW
RW
Description
Signed Integer part
Fractional part
Default value: 0x00
4.18
Matrix coefficient registers
The matrix coefficients are defined in the following way:
15


AA AB AC T X
M =  BA BB BC T Y 
CA CB CC T Z
Each coefficient has a register, where the bits are defined as:
Bit # Access Description
[31:16] RW
Signed Integer part
[15:0]
RW
Fractional part
The defaultmatrix is set to no scaling, no rotation, no translation:
1 0 0 0
Mdef ault =  0 1 0 0 
0 0 1 0
4.19
Bit #
[31:0]
Clip Pixel position 0 x Register (CLIP P0 X)
Access
RW
Description
Integer x
Default value: 0x00
4.20
Bit #
[31:0]
Clip Pixel position 0 y Register (CLIP P0 Y)
Access
RW
Description
Integer y
Default value: 0x00
4.21
Bit #
[31:0]
Clip Pixel position 1 x Register (CLIP P1 X)
Access
RW
Description
Integer x
Default value: 0x00
4.22
Bit #
[31:0]
Clip Pixel position 1 y Register (CLIP P1 Y)
Access
RW
Description
Integer y
Default value: 0x00
4.23
Bit #
[31:0]
Color Registers (COLOR0-2)
Access
RW
Description
Color bits
Default value: 0x00
There are several color modes available (set in control register ):
Mode
Format
32bpp
[31:24] is alpha channel. [23:16] is R, [15:8] is G and [7:0] is B
16bpp
[15:11] is R, [10:5] is B and [4:0] is G
8bpp gray
[7:0] sets both R, G and B values
8bpp palette [7:0] sets the color index in the palette
Currently only 16 bit color depth is fully supported.
16
4.24
Texture coordinate Registers (U0-2 and V0-2)
Bit #
[31:0]
Access
RW
Description
Coordinate bits (integer)
Default value: 0x00
4.25
Depth buffer Register (ZBUFFER BASE)
Bit #
[31:2]
[1:0]
Access
RW
-
Description
32-bit word base address
Ignored
Default value: 0x00
This register holds the base address of the depth buffer. The depth buffer operations uses
TARGET SIZE X and TARGET SIZE Y for the size of the depth buffer (it is assumed that the
render target and the depth buffer are of the same size).
5
Operation
All hardware accelerated operations draw pixels to the currently active surface (defined by TADR REG
and TSZE REG). These operations are all affected by clip p0 and clip p1. No pixels that fall outside the clipping rectangle will be rendered.
5.1
Draw pixel
Input needed: dest p0, color0
ORGFX have no hardware-support for writing a single pixel to the video memory. However it is
possible to draw a line, rect or curve with the size of one pixel. The software API makes it possible
to draw a pixel by writing directly to the memory (this is the most optimal way). Since the video
memory can point to both the framebuffer and to textures, the same operation can be used to
draw an arbitrary pixel to the screen and to load a texture into video memory.
5.2
Fill rect
Input needed: ctrl, dest p0, dest p1, color0, [src p0, src p1]
Fill rect will fill the area of a rectangle created between the pixel dest p0 and dest p1 with color.
If texturing is enabled, color will be taken from the active texture in the area between src p0 and
src p1. This operation is hardware accelerated, and is activated by setting the Rect write bit in
the control register.
5.3
Line
Input needed: ctrl, dest p0, dest p1, color0
Line will draw a line between the pixels dest p0 and dest p1 with color. This operation is hardware
accelerated.
5.4
Triangle
Input needed: ctrl, dest p0, dest p1, dest p2, color0, [color1, color2, u0, v0, u1, v1, u2, v2]
Draw the pixels in the triangle created by dest p0, dest p1 and dest p2. The triangle can be
colored with either a flat color, a gradient or a texture. Gradient or textured coloring require the
interpolation pin to be set in the control register.
17
5.5
Curve
Input needed: ctrl, dest p0, dest p1, dest p2, color0, [color1, color2, u0, v0, u1, v1, u2, v2]
Draws a filled quadratic Bézier curve with dest p0 as start, dest p1 as control point and dest p2
as end. For this operation to work, the interpolation pin must be set in the control register.
6
Clocks
The entire component uses the same clock domain.
7
Driver interface
The ORSoC graphics accelerator offers three different APIs to code against, two for bare metal
when coding directly against the processor, and a Linux kernel module. The extended bare metal
interface is a wrapper around the basic bare metal API, and makes coding easier by reducing the
number of calls. The drawback is lesser control over the graphics card.
7.1
newlib
The basic library is provided in orgfx.h and orgfx.c.
The bare metal library declares a structure that can hold surfaces (both framebuffers and
textures). Many functions take a pointer to one of these structures.
struct orgfx surface
{
u n s i g n e d i n t addr ;
unsigned i n t w;
unsigned i n t h ;
};
7.1.1
orgfx init
Description: The orgfx init must be called first to get other oc gfx commands to work properly.
v o i d o r g f x i n i t ( u n s i g n e d i n t memoryArea ) ;
7.1.2
orgfx vga set videomode
Description: Sets the video mode, width, height, bpp.
v o i d o r g f x s e t v i d e o m o d e ( u n s i g n e d i n t width ,
unsigned i n t height ,
u n s i g n e d c h a r bpp ) ;
7.1.3
orgfx vga set vbara
Description: Assign a memory address to ”Video Base Address Register A”.
v o i d o r g f x v g a s e t v b a r a ( u n s i g n e d i n t addr ) ;
7.1.4
orgfx vga set vbarb
Description: Assign a memory address to ”Video Base Address Register B”.
v o i d o r g f x v g a s e t v b a r b ( u n s i g n e d i n t addr ) ;
18
7.1.5
orgfx vga bank switch
Description: Switches the framebuffer.
void orgfx vga bank switch ( ) ;
7.1.6
orgfx init surface
Description: Initialize a surface and return a control structure for it. This function increments
an internal video memory stack pointer, so each surface will be allocated after the previous one in
memory (starting at memoryArea set by orgfx init). There is currently no memory management
in place to recycle surface memory once it is no longer in use. The first surface initialized will
point to the same memory that the video controller reads from, so it should be initialized with the
width and height of the screen.
struct orgfx surface
o r g f x i n i t s u r f a c e ( u n s i g n e d i n t width ,
unsigned i n t height ) ;
7.1.7
orgfx bind rendertarget
Description: Binds a surface as the active render target. This function must be called before any
drawing operations can be performed.
void o r g f x b i n d r e n d e r t a r g e t ( s t r u c t o r g f x s u r f a c e ∗ s u r f a c e ) ;
7.1.8
orgfx enable cliprect
Description: Enables/disables clipping.
i n l i n e void o r g f x e n a b l e c l i p r e c t ( unsigned i n t enable ) ;
7.1.9
orgfx cliprect
Description: Sets the clipping rect. No pixels will be drawn outside of this rect (useful for
restricting draws to a specific area of the render target). orgfx bind rendertarget will reset the
clipping rect to the size of the surface.
i n l i n e void o r g f x c l i p r e c t ( unsigned
unsigned
unsigned
unsigned
7.1.10
int
int
int
int
x0 ,
y0 ,
x1 ,
y1 ) ;
orgfx srcrect
Description: Sets the source rectangle that will be used by texturing operations. This allows for
only drawing a small part of a texture. orgfx bind tex0 will reset this to the size of the texture.
i n l i n e void o r g f x s r c r e c t ( unsigned
unsigned
unsigned
unsigned
7.1.11
int
int
int
int
x0 ,
y0 ,
x1 ,
y1 ) ;
orgfx set pixel
Description: Set a pixel on coordinate x,y to color. This is done in software by direct memory
writes. This operation is not affected by the clipping rect!
i n l i n e void o r g f x s e t p i x e l ( i n t x ,
int y ,
unsigned i n t c o l o r ) ;
19
7.1.12
orgfx memcpy
Description: Copies memory from the processor to the video memory. Size is in 32-bit words.
This function is intended to work with the output array of the sprite converter utility to load
images into memory. Remember to bind a texture as the render target first!
v o i d orgfx memcpy ( u n s i g n e d i n t mem [ ] ,
unsigned i n t s i z e ) ;
7.1.13
orgfx set color
Description: Sets the current drawing color (for flat coloring).
i n l i n e void o r g f x s e t c o l o r ( unsigned i n t c o l o r ) ;
7.1.14
orgfx set colors
Description: Sets all the current drawing colors (for gradient coloring).
i n l i n e void o r g f x s e t c o l o r s ( unsigned i n t color0 ,
unsigned i n t color1 ,
unsigned i n t c o l o r 2 ) ;
7.1.15
orgfx rect
Description: Draws a rect from (x0,y0) to (x1,y1) and fills it with the current drawing color. If
texturing is enabled, the current texture will be drawn instead.
i n l i n e void o r g f x r e c t ( i n t
int
int
int
7.1.16
x0 ,
y0 ,
x1 ,
y1 ) ;
orgfx line
Description: Draws a line from (x0,y0) to (x1,y1) with the current drawing color. If texturing is
enabled, the first pixel of the current texture will be drawn instead.
i n l i n e v o i d o r g f x l i n e ( i n t x0 , i n t y0 ,
i n t x1 , i n t y1 ) ;
7.1.17
orgfx line3d
Description: Draws a line from (x0,y0,z0) to (x1,y1,z1) with the current drawing color. If
texturing is enabled, the first pixel of the current texture will be drawn instead.
i n l i n e v o i d o r g f x l i n e 3 d ( i n t x0 , i n t y0 , i n t z0 ,
i n t x1 , i n t y1 , i n t z1 ) ;
7.1.18
orgfx triangle
Description: Draws a filled triangle of the space spanned by (x0,y0), (x1,y1) and (x2,y2). The
order of the points is important, since triangles calculated to be counter clockwise will be discarded
(backface culling). The interpolate flag indicates if flat coloring or interpolated coloring should
be used. The interpolate flag must be enabled if interpolated alpha, texture coordinates or depth
buffer culling is desired (flat coloring can be obtained by setting all three color registers to the
same color).
20
i n l i n e v o i d o r g f x t r i a n g l e ( i n t x0 , i n t y0 ,
i n t x1 , i n t y1 ,
i n t x2 , i n t y2 ,
unsigned i n t i n t e r p o l a t e ) ;
7.1.19
orgfx triangle3d
Description: This function works the same way as the triangle function, but the Z-values are set.
i n l i n e v o i d o r g f x t r i a n g l e 3 d ( i n t x0 , i n t y0 , i n t z0 ,
i n t x1 , i n t y1 , i n t z1 ,
i n t x2 , i n t y2 , i n t z2 ,
unsigned i n t i n t e r p o l a t e ) ;
7.1.20
orgfx curve
Description: Draws a Quadratic curve between the points (x0,y0) and (x2,y2) with the control
points (x1,y1). The three points form a triangle. The inside flag determines if the inside or outside
of the curve is filled inside the triangle.
i n l i n e v o i d o r g f x c u r v e ( i n t x0 , i n t y0 ,
i n t x1 , i n t y1 ,
i n t x2 , i n t y2 ,
unsigned i n t i n s i d e ) ;
7.1.21
orgfx uv
Description: Sets the three texture coordinates used in textured triangle renders.
i n l i n e v o i d o r g f x u v ( u n s i g n e d i n t u0 , u n s i g n e d i n t v0 ,
u n s i g n e d i n t u1 , u n s i g n e d i n t v1 ,
u n s i g n e d i n t u2 , u n s i g n e d i n t v2 ) ;
7.1.22
orgfx enable tex0
Description: Enables or disables texturing.
void o r g f x e n a b l e t e x 0 ( unsigned i n t enable ) ;
7.1.23
orgfx bind tex0
Description: Binds a surface as the current texture. Will reset the source rect.
void o r g f x b i n d t e x 0 ( s t r u c t o r g f x s u r f a c e ∗ s u r f a c e ) ;
7.1.24
orgfx enable zbuffer
Description: Enables or disables reads and writes to the depth buffer. Requires that a depth
buffer is bound.
void o r g f x e n a b l e z b u f f e r ( unsigned i n t enable ) ;
7.1.25
orgfx bind zbuffer
Description: Binds the depth buffer. This surface should have the same resolution as the render
target.
void o r g f x b i n d z b u f f e r ( s t r u c t o r g f x s u r f a c e ∗ s u r f a c e ) ;
21
7.1.26
orgfx clear zbuffer
Description: Clears the depth buffer.
void o r g f x c l e a r z b u f f e r ( ) ;
7.1.27
orgfx enable alpha
Description: Enables or disables alpha blending.
void o r g f x e n a b l e a l p h a ( unsigned i n t enable ) ;
7.1.28
orgfx set alpha
Description: Sets the alpha blending value.
void o r g f x s e t a l p h a ( unsigned i n t alpha ) ;
7.1.29
orgfx enable colorkey
Description: Enables or disables colorkey.
void o r g f x e n a b l e c o l o r k e y ( unsigned i n t enable ) ;
7.1.30
orgfx set colorkey
Description: Sets the colorkey color.
void o r g f x s e t c o l o r k e y ( unsigned i n t colorkey ) ;
7.1.31
orgfx enable transform
Description: Enables or disables hardware accelerated transformation of points.
void o r g f x e n a b l e t r a n s f o r m ( unsigned i n t enable ) ;
7.1.32
orgfx set transformation matrix
Description: Sets the 3 by 4 transformation matrix used in hardware.
v o i d o r g f x s e t t r a n s f o r m a t i o n m a t r i x ( i n t aa , i n t ab , i n t ac , i n t tx ,
i n t ba , i n t bb , i n t bc , i n t ty ,
i n t ca , i n t cb , i n t cc , i n t t z ) ;
7.2
Extended newlib
The extended library is provided in orgfx plus.h and orgfx plus.c, but orgfx.c also has to be
compiled for it to work.
Instead of using surface structs directly, the extended API hides surface management by returning id tags for each surface. The screen surface (defined by id -1) is handled as a single surface,
even when double buffering is enabled.
The driver defines the number of available surfaces (not counting the screen) with a static
define. Change this if the default value is too low for your application.
There are no 3D functions in this API. For the more advanced 3D functionality (meshes, depth
buffering etc.), see the advanced API.
22
7.2.1
orgfxplus init
Description: Initializes the screen with the supplied video mode and returns an id for the screen.
The only supported bpp is 16. Double buffering and depth buffering can be enabled (and the
appropriate buffers will be allocated). The depth buffer is allocated with the same size as the
screen. There is no support in the driver to allocate more than one depth buffer.
i n t o r g f x p l u s i n i t ( unsigned
unsigned
unsigned
unsigned
unsigned
7.2.2
i n t width ,
i n t height ,
c h a r bpp ,
char doubleBuffering ,
char z b u f f e r ) ;
orgfxplus init surface
Description: Unlike the basic API, this function both initializes a surface and loads a prepared
image to it in one function call. The return value is an id that can be used to bind the surface. It
changes render target during operation, but switches back to the last render target on completion.
Since the screen(s) are already initialized by a call to init, they do not need to be loaded using this
function.
i n t o r g f x p l u s i n i t s u r f a c e ( u n s i g n e d i n t width ,
unsigned i n t height ,
u n s i g n e d i n t mem [ ] ) ;
7.2.3
orgfxplus bind rendertarget
Description: Binds a surface as the current render target.
void o r g f x p l u s b i n d r e n d e r t a r g e t ( i n t s u r f a c e ) ;
7.2.4
orgfxplus bind tex0
Description: Binds a surface as the current active texture.
void o r g f x p l u s b i n d t e x 0 ( i n t s u r f a c e ) ;
7.2.5
orgfxplus flip
Description: Swaps which buffer to draw on when using double buffering. Needs to be called
once before anything shows up on screen!
void o r g f x p l u s f l i p ( ) ;
7.2.6
orgfxplus clip
Description: Sets the current clipping rect. This is reset to the size of the new render target
when orgfxplus bind rendertarget is called.
i n l i n e void o r g f x p l u s c l i p ( unsigned
unsigned
unsigned
unsigned
unsigned
int
int
int
int
int
23
x0 ,
y0 ,
x1 ,
y1 ,
enable ) ;
7.2.7
orgfxplus fill
Description: Draws a rectangle to the current render target with a flat color.
v o i d o r g f x p l u s f i l l ( i n t x0 , i n t y0 ,
i n t x1 , i n t y1 ,
7.2.8
orgfxplus line
Description: Draws a line from (x0,y0) to (x1,y1) to the current render target with a flat color.
v o i d o r g f x p l u s l i n e ( i n t x0 , i n t y0 ,
i n t x1 , i n t y1 ,
7.2.9
orgfxplus triangle
Description: Draws a triangle between the points (x0,y0),(x1,y1) and (x2,y2) and fills it with a
color.
v o i d o r g f x p l u s t r i a n g l e ( i n t x0 , i n t y0 ,
i n t x1 , i n t y1 ,
i n t x2 , i n t y2 ,
7.2.10
orgfxplus curve
Description: Draws a quadratic Bézier curve from (x0,y0) to (x2,y2) with the control point
(x1,y1). Uses flat coloring.
v o i d o r g f x p l u s c u r v e ( i n t x0 , i n t y0 ,
i n t x1 , i n t y1 ,
i n t x2 , i n t y2 ,
unsigned i n t inside ,
7.2.11
orgfxplus draw surface
Description: Draws a texture to the current render target.
v o i d o r g f x p l u s d r a w s u r f a c e ( i n t x0 , i n t y0 ,
unsigned i n t s u r f a c e ) ;
7.2.12
orgfxplus draw surface section
Description: Draws a section of a texture defined by src0, src1 to the current render target.
v o i d o r g f x p l u s d r a w s u r f a c e s e c t i o n ( i n t x0 , i n t y0 ,
unsigned i n t srcx0 ,
unsigned i n t srcy0 ,
unsigned i n t srcx1 ,
unsigned i n t srcy1 ,
unsigned i n t s u r f a c e ) ;
24
7.2.13
orgfxplus colorkey
Description: Sets the colorkey color and enables or disables the use of the colorkey.
void o r g f x p l u s c o l o r k e y ( unsigned i n t colorkey ,
unsigned i n t enable ) ;
7.2.14
orgfxplus alpha
Description: Sets the alpha value and enables or disables the use of the alpha blending.
v o i d o r g f x p l u s a l p h a ( u n s i g n e d i n t alpha ,
unsigned i n t enable ) ;
7.3
Bitmap Fonts
Note that bitmap fonts can be generated with the bitfontmaker utility. This utility generates an
initialization function that calls the orgfx make bitmap font function and returns a valid font.
7.3.1
orgfx make bitmap font
Creates a orgfx bitmap font from a image. glyphSpacing is the space in pixels between two glyphs
in the string, and spaceWidth is the size of the space character.
o r g f x b i t m a p f o n t o r g f x m a k e b i t m a p f o n t ( o r g f x t i l e s e t ∗ glyphs ,
u n s i g n e d i n t gl yp h Sp aci ng ,
u n s i g n e d i n t spaceWidth ) ;
7.3.2
orgfx put text
Puts the text ”str” on the screen with the specified ”font” on position x0,y0.
void o r g f x p u t t e x t ( o r g f x f o n t ∗ font ,
i n t x0 , i n t y0 ,
const wchar t ∗ s t r ) ;
Note the use of wide strings (which enables the use of special characters such as åäö). Example
usage:
o r g f x p u t t e x t (& f o n t , x0 , y0 ,
L”Some example t e x t ” ) ;
7.4
Vector Fonts
Note that vector fonts can be generated with the fonter utility. This utility generates an initialization function that calls the orgfx make vector font and orgfx init vector font functions and
returns a valid font.
7.4.1
orgfx make vector font
Creates a orgfx vector font from a series of glyphs.
o r g f x v e c t o r f o n t o r g f x m a k e v e c t o r f o n t ( Glyph ∗ g l y p h l i s t ,
int size ,
Glyph ∗∗ g l y p h i n d e x l i s t ,
int glyphindexlistsize )
25
7.4.2
orgfx init vector font
Initializes the font for use. Needs to be called to set the index list.
int
7.4.3
o r g f x i n i t v e c t o r f o n t ( orgfx vector font font ) ;
orgfx put vector char
Prints one glyph from the font with the current transformation matrix. If the glyph is not supported
in the font the function will return without doing anything.
void o r g f x p u t v e c t o r c h a r ( o r g f x v e c t o r f o n t ∗ font , wchar t text ) ;
7.4.4
orgfx put vector text
Prints a string of characters using a vector font. This function sets the transformation matrix from
the offset, scale and rotation parameters, then makes a series of calls to orgfx put vector char.
void o r g f x p u t v e c t o r t e x t ( o r g f x v e c t o r f o n t ∗ font ,
orgfx point3 offset ,
orgfx point3 scale ,
orgfx point3 rotation ,
const wchar t ∗ str ,
7.5
3D API
There are two major parts of the 3D API, one is the transformation matrix interface and the other
is the 3D mesh interface.
7.5.1
Transformations
By setting the transformation matrix the ORGFX core can perform hardware accelerated transformations for every point sent to it, causing significantly less overhead than if this was done in
software.
The relevant functions are listed below:
orgfx
orgfx
orgfx
orgfx
orgfx
orgfx
matrix
matrix
matrix
matrix
matrix
matrix
orgfx3d
orgfx3d
orgfx3d
orgfx3d
orgfx3d
orgfx3d
i d e n t i t y ( void ) ;
r o t a t e X ( o r g f x m a t r i x mat , f l o a t rad ) ;
r o t a t e Y ( o r g f x m a t r i x mat , f l o a t rad ) ;
r o t a t e Z ( o r g f x m a t r i x mat , f l o a t rad ) ;
s c a l e ( o r g f x m a t r i x mat , o r g f x p o i n t 3 s ) ;
t r a n s l a t e ( o r g f x m a t r i x mat , o r g f x p o i n t 3 t ) ;
i n l i n e v o i d o r g f x 3 d s e t m a t r i x ( o r g f x m a t r i x mat ) ;
7.5.2
orgfx3d make mesh
Initializes a mesh with the necessary arrays generated by the meshmaker utility.
o r g f x m e s h orgfx3d make mesh ( o r g f x f a c e ∗ f a c e s ,
u n s i g n e d i n t nFaces ,
orgfx point3 ∗ verts ,
u n s i g n e d i n t nVerts ,
o r g f x p o i n t 2 ∗ uvs ,
u n s i g n e d i n t nUvs ) ;
26
7.5.3
orgfx3d mesh texture size
This should be called only once for each mesh that will be using texture coordinates. Since the
ORGFX device uses pixel coordinates the UV coordinates must be updated with the size of the
used texture.
v o i d o r g f x 3 d m e s h t e x t u r e s i z e ( o r g f x m e s h ∗ mesh ,
u n s i g n e d i n t width ,
unsigned i n t height ) ;
7.5.4
orgfx3d draw mesh
This function draws the mesh to screen, using the supplied translation, rotation and scale vectors
to set the transformation matrix. If filled is set to zero, the mesh will be drawn as a colored
wireframe. If filled is set to one and textured to zero, the mesh will be drawn with interpolated
colors (the mesh format currently does not support materials). If filled is set to one and textured
is also set to one, the mesh will be textured using interpolated uv texture coordinates.
v o i d o rg f x3 d d r a w me s h ( o r g f x m e s h ∗ mesh ,
orgfx point3 translation ,
orgfx point3 rotation ,
orgfx point3 scale ,
int f i l l e d , int textured ) ;
7.6
Linux
The current version of the core does not have a Linux driver.
7.7
Software emulation
The entire device has a software implementation to make it easier to write applications for the
device. The orgfx sw.c file replaces the orgfx.c and orgfx plus.c files, and renders pixels as
they would be rendered by the graphics accelerator, but on a PC. The software implementation
uses SDL as the backend.
7.8
7.8.1
Utilities
Sprite maker utility
A small application that converts an image into a header file that can be included in the project
when compiled. The application generates an array of color values that can be loaded as a sprite.
The application has support for reading common image file formats such as bmp, png and jpg
(for a full list, see the supported file formats of the SDL image libaray). 8- 16- and 32-bit output is
supported, and can be changed by passing a command line argument to the program (by default,
the output is adjusted for 16 bit color mode).
The resulting output header file, which is named after the input, can be included in a program
using the extended bare metal driver. The easiest way to use the sprite is to use the generated
initialize function defined in the header file.
7.8.2
Bitmap font maker utility
Another application generates the data structures necessary to load bitmap fonts with very little
effort. It takes an image and a grid spacing as input, and automatically generates offsets for all
the glyphs in the font. The font generated by the program has 256 characters arranged according
to the ASCII charset, as seen in figure 5 and 6.
The application has support for reading common image file formats such as bmp, png and jpg
(for a full list, see the supported file formats of the SDL image libaray). 8- 16- and 32-bit output is
supported, and can be changed by passing a command line argument to the program (by default,
27
Figure 5: The ASCII table. Each number from 0 to 127 refers to a character. The numbers 0 to
31 cannot be printed.
28
Figure 6: The extended ASCII table. Each number from 128 to 255 refers to a character, mostly
special characters not included in the basic table.
29
Figure 7: A font rendered by the software implementation of the ORGFX. Bézier curves are single
colored while the triangles are interpolated between current color and black
the output is adjusted for 16 bit color mode). Both vertical and horizontal grid spacing are set to
32 pixels by default, but this can be changed through command line arguments.
using the bare metal and font driver. The easiest way to use the bitmap font is to use the generated
initialize function defined in the header file.
7.8.3
Mesh maker utility
The mesh maker utility loads 3D objects and generates a header file that can be used by the
advanced 3D API. Currently the utility only supports Wavefront .obj files which only contains 3rd
order polygons. Any higher order polygons will be discarded, so all polygons in the model must
be converted to triangles prior to running the utility.
The application supports loading texture coordinates for each vertex, allowing for textured
meshes.
using the bare metal 3D API. The easiest way to use the mesh is to use the generated initialize
function defined in the header file.
7.8.4
Vector font maker utility
The Font maker is a application that can convert a .TTF file to a format that the graphics card can
handle. The Font maker outputs a .h file that can be included in a project to enable the graphics
accelerators vector font capabilities. The converter finds all explicit vector points in a TTF file
and then calculates the implicit points and checks where the glyphs contours end. The points are
then sent to a Delaunay triangulation function based on the work of V. Domiter and B. Zalik and
implemented by M. Green and T. Åhlén 1 . The generated .h file consists of two fields for each
glyph, one field for Bézier writes and one for triangle writes. The generated header file will contain
two lists for each glyph, one to store Bézier writes and one to store triangle writes. The rendered
result can be seen in figure 7.
8
Programming examples
The following piece of code shows how to use the extended interface for a bare metal implementation on the ORPSoCv2 platform. Bahamut cc.png.h is a 186 by 248 pixel image with a pinkish
1 http://code.google.com/p/poly2tri/
30
background (rgb code ff00ff, or f81f in 16 bit). The header file is generated by the sprite maker
utility at 16 bit color depth.
#i n c l u d e ” o r g f x p l u s . h”
#i n c l u d e ” Bahamut cc . png . h”
i n t main ( v o i d )
{
int i ;
// I n i t i a l i z e s c r e e n t o 640 x480 −16@60
// No d o u b l e b u f f e r i n g
i n t screen = o r g f x p l u s i n i t (640 , 480 , 16 , 0 ) ;
// I n i t i a l i z e dragon s p r i t e
int bahamut sprite =
o r g f x p l u s i n i t s u r f a c e ( 1 8 6 , 2 4 8 , Bahamut cc ) ;
// A c t i v a t e c o l o r k e y i n g
o r g f x p l u s c o l o r k e y (0 xf81f , 1 ) ;
// C l e a r s c r e e n , w h i t e c o l o r
o r g f x p l u s f i l l (0 ,0 ,640 ,480 ,0 x f f f f ) ;
// Draw a few l i n e s with d i f f e r e n t c o l o r s
o r g f x p l u s l i n e (200 ,100 ,10 ,10 ,0 xf000 ) ;
o r g f x p l u s l i n e (200 ,100 ,351 ,31 ,0 x0ff0 ) ;
o r g f x p l u s l i n e (200 ,100 ,121 ,231 ,0 x00f0 ) ;
o r g f x p l u s l i n e (200 ,100 ,321 ,231 ,0 xf00f ) ;
// Draw t h e dragon a t d i f f e r e n t a l p h a s e t t i n g s
orgfxplus alpha (64 ,1);
o r g f x p l u s d r a w s u r f a c e (100 , 100 , bahamut sprite ) ;
while ( 1 ) ;
}
More example programs are supplied with the implementation in the sw/examples directory.
31
B
Appendix B, Enhanced VGA/LCD Specification
75
VGA/LCD Core
v2.0
Specifications
Author: Richard Herveille
rherveille@opencores.org
Document rev. 1.2
March 20, 2003
This page left intentionally blank
OpenCores
Enhanced VGA/LCD Core Datasheet
3/20/2003
Revision History
Rev.
0.1
0.1a
Date
10/04/01
20/04/01
Author
Richard Herveille
Richard Herveille
0.2
21/05/01
Richard Herveille
0.3
0.4
28/05/01
03/06/01
Richard Herveille
Richard Herveille
0.4a
04/06/01
Richard Herveille
0.5
15/07/01
Richard Herveille
0.6
31/07/01
Richard Herveille
0.7
10/19/01
Richard Herveille
0.8
28/01/02
Richard Herveille
1.0
28/03/02
Richard Herveille
1.1
1.2
20/04/02
18/03/03
Richard Herveille
Richard Herveille
www.opencores.org
Description
First Draft
Changed proposal to specifications
Added Appendix A
Extended Register Specifications
First official release
Added OpenCores logo
Changed Chapter 1, Introduction
Finished Chapter 2, IO ports
Finished Chapter 3, Registers
Extended Chapter 4, Operation
Changed Chapter 5, Architecture
Added Appendix B
Fixed some inconsistencies.
Changed all references to address related subjects
(core fix & documentation fix).
Added Appendix C
Fixed some minor typing errors in the document
(credits: Rudolph Usselmann)
Added Color Lookup Table bank switching.
Added embedded CLUT section.
Revised horizontal & vertical timing section.
Added Power-on-Reset description.
Changed CBSE & VBSE bits functionality.
Added Bank Switch Section.
Added VGA & CLUT section to Appendix B.
Changed introduction page.
Major VGA/LCD Core changes; core v2.0.
Changed Manual to reflect core changes.
Removed all references to external CLUT
v2.0 core has CLUT internally.
Fixed some typos.
Added 32bpp mode.
Added Bandwidth Issues section.
Expanded Bandwidth Issues section.
Added Hardware Cursor sections.
Added Table of Contents.
Added Appendix-D.
Changed Architecture section.
Changed Operation section.
Changed introduction page.
Changed table headers.
Added OpenCores logo to page header.
Revised entire document.
Changed VGA timing section.
Added support for WISHBONE revB.3
Synchronous Registered Feedback Cycles.
Rev 1.2 Preliminary
OpenCores
3/20/2003
Table of contents
INTRODUCTION........................................................................................................1
IO PORTS.....................................................................................................................2
2.1 CORE PARAMETERS ...............................................................................................2
2.2 WISHBONE SYSCON INTERFACE CONNECTIONS .................................................2
2.3 WISHBONE SLAVE INTERFACE CONNECTIONS ...................................................3
2.4 WISHBONE MASTER INTERFACE CONNECTIONS ................................................4
2.5 VGA PORT CONNECTIONS ....................................................................................5
REGISTERS .................................................................................................................7
3.1 REGISTERS LIST .....................................................................................................7
3.2 ACCESSING RESERVED ADDRESS LOCATIONS .......................................................7
3.3 CONTROL REGISTER [CTRL] ................................................................................8
3.4 STATUS REGISTER [STAT]..................................................................................13
3.5 HORIZONTAL TIMING REGISTER [HTIM] ............................................................14
3.6 VERTICAL TIMING REGISTER [VTIM] .................................................................15
3.7 HORIZONTAL AND VERTICAL LENGTH REGISTER [HVLEN] ...............................15
3.8 VIDEO BASE ADDRESS [VBARA] [VBARB].......................................................16
3.9 HARDWARE CURSOR BASE ADDRESS [C0BAR] [C1BAR] .................................17
3.10 HARDWARE CURSOR (X,Y) REGISTER [C0XY] [C1XY]...................................17
3.11 HARDWARE CURSOR COLOR REGISTERS [C0CR] [C1CR]................................17
3.12 8BPP PSEUDO COLOR LOOKUP TABLE [PCLT]..................................................18
OPERATION..............................................................................................................19
4.1 VIDEO TIMING .....................................................................................................19
4.1.1 HORIZONTAL VIDEO TIMING ............................................................................19
4.1.2 VERTICAL VIDEO TIMING .................................................................................20
4.1.3 COMBINED VIDEO FRAME TIMING....................................................................21
4.2 PIXEL COLOR GENERATION .................................................................................22
4.2.1 COLOR PROCESSOR INTERNALS ........................................................................22
4.2.2 ADDRESS GENERATOR .....................................................................................22
4.2.3 DATA BUFFER ..................................................................................................22
4.2.4 COLORIZER .......................................................................................................22
4.2.5 COLOR LOOKUP TABLE ....................................................................................25
4.3 HARDWARE CURSORS .........................................................................................26
4.3.1 INTRODUCTION .................................................................................................26
4.3.2 CURSOR PATTERNS ...........................................................................................26
4.3.3 TURNING OFF 3D SUPPORT. ..............................................................................27
4.3.4 CURSOR PROCESSOR INTERNALS ......................................................................28
4.3.5 ADDRESS GENERATOR .....................................................................................28
4.3.6 CURSOR BUFFER...............................................................................................28
4.3.7 CURSOR0/CURSOR1 PROCESSOR ......................................................................29
4.4 BANK SWITCHING ................................................................................................30
4.4.1 INTRODUCTION .................................................................................................30
4.4.2 HOST NOTES .....................................................................................................30
4.4.3 SEQUENCE ........................................................................................................30
www.opencores.org
Rev 1.2 Preliminary
OpenCores
3/20/2003
4.5 BANDWIDTH ISSUES ............................................................................................31
4.5.1 INTRODUCTION .................................................................................................31
4.5.2 CALCULATIONS ................................................................................................31
4.5.3 EXAMPLES ........................................................................................................32
ARCHITECTURE .....................................................................................................33
5.1 COLOR LOOKUP TABLE .......................................................................................33
5.2 CURSOR BASE REGISTERS ...................................................................................34
5.2 CURSOR BUFFERS ................................................................................................34
5.3 CURSOR PROCESSOR ...........................................................................................34
5.4 COLOR PROCESSOR .............................................................................................34
5.5 LINE FIFO ...........................................................................................................34
5.6 VIDEO MEMORY BASE REGISTERS ......................................................................34
5.7 VIDEO TIMING GENERATOR ................................................................................34
5.8 WISHBONE MASTER INTERFACE ..........................................................................35
5.9 WISHBONE SLAVE INTERFACE .............................................................................35
VGA MODES .............................................................................................................36
A.1 VERTICAL TIMING INFORMATION COMMON VGA MODES .................................36
A.2 HORIZONTAL TIMING INFORMATION COMMON VGA MODES ............................36
TARGET DEPENDENT IMPLEMENTATIONS..................................................37
CORE STRUCTURE.................................................................................................38
DESIGN NOTES ........................................................................................................39
D.1 INTRODUCTION ...................................................................................................39
D.2 VGA_CURPROC....................................................................................................40
www.opencores.org
Rev 1.2 Preliminary
OpenCores
3/20/2003
1
Introduction
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Features
General Description
CRT and LCD display support
Separate VSYNC/HSYNC and
combined CSYNC synchronization
signals
Composite BLANK signal
User programmable video timing
User programmable video resolutions
User programmable video control
signals polarization levels
32bpp, 24bpp and 16bpp color modes
8bpp grayscale and 8bpp pseudo-color
modes
Supports video- and/or color-lookuptable bank switching during vertical
retrace
Support for up to two hardware
cursors
Per cursor user selectable resolutions,
23x23 pixels and 64x64 pixels
Alpha blending support for 3D cursors
Triple display support
32bit WISHBONE RevB.3 compliant
Slave and Master interfaces
Operation from a wide range of input
clock frequencies
Static synchronous design
Full synthesizability
The OpenCores Enhanced VGA/LCD
Controller Core provides VGA
capabilities for embedded systems. It
supports both CRT and LCD displays
with user programmable resolutions
and video timings, thus providing
compatibility with almost all available
LCD and CRT displays.
The core supports a number of color
modes, including 32bpp, 24bpp,
16bpp, 8bpp grayscale, and 8bpppseudo color. The video memory is
located outside the primary core, thus
providing the most flexible memory
solution possible. It can be located onchip or off-chip, shared with the
system’s main memory (VGA on
demand) or be dedicated to the VGA
system. The color lookup table is
located inside the core, to reduce
memory bandwidth requirements and
to provide higher throughput. Image
data is fetched automatically via the
WISHBONE Master interface, making
this an ideal “program-and-forget”
video solution. More demanding video
applications, like streaming video or
video games, can benefit from the
video-bank-switching function. Flicker
and cluttered images are reduced by
automatically
switching
between
video-memory pages and/or color
lookup tables on each vertical retrace.
The optional hardware cursors provide
additional flexibility through two
32x32 16bpp or 64x64 4bpp hardware
generated cursors. The two cursors can
be displayed at the same time.
Core overview
www.opencores.org
Rev 1.2 Preliminary
1 of 40
OpenCores
Typically, one is for the GUI and one
for user applications. Cursor patterns
are stored in an off-screen portion of
the video memory or, if accessible by
the core, in the main memory and are
automatically loaded into internal
buffers to reduce memory bandwidth
requirements. Moving the cursors on
www.opencores.org
3/20/2003
the screen is as simple as changing a
single register.
The core can interrupt the host on each
horizontal and/or vertical sync pulse.
The horizontal, vertical, and composite
synchronization polarization levels, as
well as the blanking polarization level
are programmable by software.
Rev 1.2 Preliminary
2 of 40
OpenCores
3/20/2003
2
IO ports
2.1 Core Parameters
Parameter
ARST_LVL
LINE_FIFO_AWIDTH
Type
Bit
Integer
Default
1’b0
7
Description
Asynchronous reset level
Line Fifo Size
2.1.1 ARST_LVL
The asynchronous reset level can be set to either active high (1’b1) or active low
(1’b0).
2.1.2 LINE_FIFO_AWIDTH
The line FIFO size can be altered by changing the amount of address bits the FIFO
logic should use. The line FIFO depth (amount of entries) can be calculated as
follows:
entries = 2 LINE _ FIFO _ AWIDTH
2.2 WISHBONE Syscon Interface Connections
Port
wb_clk_i
wb_rst_i
rst_i
wb_inta_o
Width
1
1
1
1
Direction
Input
Input
Input
Output
Description
Master clock input
Synchronous active high reset
Asynchronous reset
Interrupt request signal
2.2.1 wb_clk_i
All internal WISHBONE logic is registered to the rising edge of the [wb_clk_i] clock
input. The frequency range over which the core can operate depends on the
technology used and the pixel clock needed; [wb_clk_i] may not be slower than the
pixel clock [clk_p_i].
2.2.2 wb_rst_i
The active high synchronous reset input [wb_rst_i] forces the core to restart. All
internal registers are preset and all state-machines are set to an initial state.
2.2.3 rst_i
The asynchronous reset input [rst_i] forces the core to restart. All internal registers are
preset and all state-machines are set to an initial state. The reset level, either active
high or active low, is set by the ARST_LVL parameter.
www.opencores.org
Rev 1.2 Preliminary
2 of 40
OpenCores
3/20/2003
rst_i is not a WISHBONE-compatible signal. It is primarily provided for FPGA
implementations. Using [rst_i] instead of [wb_rst_i] can result in lower cell usage and
higher performance, because most FPGAs provide a dedicated asynchronous reset
path. Use either [rst_i] or [wb_rst_i]. Hardcode the unused reset input to a negated
state.
The core requires a power-on reset, allowing all internal registers to propagate to a
known state. The power-on reset must be held asserted until all clocks are stable.
When all clocks are stable the reset signal must remain asserted for at least 3 clock
cycles of the slowest available clock [clk_p_i].
2.2.4 wb_inta_o
The interrupt request output is asserted when the core needs service from the host
system.
2.3 WISHBONE Slave Interface Connections
Port
wbs_adr_i
wbs_dat_i
wbs_dat_o
wbs_sel_i
wbs_we_i
wbs_stb_i
wbs_cyc_i
wbs_ack_o
wbs_err_o
Width
12
32
32
4
1
1
1
1
1
Direction
Input
Input
Output
Input
Input
Input
Input
Output
Output
Description
Lower address bits
Slave Data bus input
Slave Data bus output
Byte select signals
Write enable input
Strobe signal/Core select input
Valid bus cycle input
Bus cycle acknowledge output
Bus cycle error output
2.3.1 wbs_adr_i
The address array input [wbs_adr_i] is used to pass a binary coded address to the core.
The most significant bit is at the higher number of the array.
2.3.2 wbs_dat_i
The data array input [wbs_dat_i] is used to pass binary data from the current
WISHBONE Master to the core. All data transfers are 32bit wide.
2.3.3 wbs_dat_o
The data array output [wbs_dat_o] is used to pass binary data from the core to the
current WISHBONE Master. All data transfers are 32bit wide.
2.3.4 wbs_sel_i
The byte select array input [wbs_sel_i] indicates where valid data is placed on the
[wbs_dat_i] input array during writes to the core, and where it is expected on the
[wbs_dat_o] output array during reads from the core. The core requires all accesses to
be 32bit wide [wbs_sel_i(3:0) = ‘1111’b].
www.opencores.org
Rev 1.2 Preliminary
3 of 40
OpenCores
3/20/2003
2.3.5 wbs_we_i
When asserted, the write enable input [wbs_we_i] indicates whether the current bus
cycle is a read or a write cycle. The signal is asserted during write cycles and negated
during read cycles.
2.3.6 wbs_stb_i
The strobe input [wbs_stb_i] is asserted when the core is being addressed. The core
only responds to WISHBONE cycles when [wbs_stb_i] is asserted, except for the
[wb_rst_i] and [rst_i] reset signals, which always receive a response.
2.3.7 wbs_cyc_i
When asserted, the cycle input [wbs_cyc_i] indicates that a valid bus cycle is in
progress. The logical AND function of [wbs_cyc_i] and [wbs_stb_i] indicates a valid
transfer cycle to/from the core.
2.3.8 wbs_ack_o
When asserted, the acknowledge output [wbs_ack_o] indicates the normal termination
of a valid bus cycle.
2.3.9 wbs_err_o
When asserted, the error output [wbs_err_o] indicates an abnormal termination of a
bus cycle. The [wbs_err_o] output signal is asserted when the host tries to access the
controller’s internal registers not using 32-bit aligned data; i.e. when [wbs_sel_i(3:0)]
is unequal to ‘1111’b.
2.4 WISHBONE Master Interface Connections
Port
wbn_adr_o
wbm_dat_i
wbm_sel_o
wbm_we_o
wbm_stb_o
wbm_cyc_o
wbm_cti_o
Wbm_bte_o
wbm_ack_i
wbm_err_i
Width
32
32
4
1
1
1
3
2
1
1
Direction
Output
Input
Output
Output
Output
Output
Output
Output
Input
Input
Description
Address bus output
Data bus input
Byte select signals
Write enable output
Strobe signal
Valid bus cycle output
Cycle type identifier output
Burst type extensions output
Bus cycle acknowledge input
Bus cycle error Input
2.4.1 wbm_adr_o
The address array output [wbm_adr_o] is used to pass a binary coded address from
the core to the external video memory. The most significant bit is at the higher
number of the array.
2.4.2 wbm_dat_i
The data array input [wbm_dat_i] is used to pass binary data from the external video
memory to the core. All data transfers are 32bit wide.
www.opencores.org
Rev 1.2 Preliminary
4 of 40
OpenCores
3/20/2003
2.4.3 wbm_sel_o
The byte select array output [wbm_sel_o] indicates where valid data is expected on
the [wbm_dat_i] input array. The core supports 32-bit wide accesses only
[wbm_sel_o(3:0) = ‘1111’b].
2.4.4 wbm_we_o
When asserted, the write enable output [wbm_we_o] indicates whether the current bus
cycle is a read or a write cycle. The core only reads from the external memory;
therefore, [wbm_we_o] is always negated (‘0’).
2.4.5 wbm_stb_o
The strobe output [wbm_stb_o] is asserted when the core wants to read from the
external video memory.
2.4.6 wbm_cyc_o
The cycle output [wbm_cyc_o] is asserted when the core wants to read from the
external video memory.
2.4.7 wbm_cti_o
The Wishbone revB.3 cycle type identifier output [wbm_cti_o] gives compliant slaves
additional information about the current cycle. The vga core supports the Registered
Feedback Cycles introduced in the Wishbone revB.3 specs. The core supports
‘Classic’ and ‘Incrementing Burst’ transfers. The table below shows the values
[wbm_cti_o] can take, any other values should be considered a core error.
wbm_cti_o
000b
010b
111b
Meaning
Wishbone Classic (i.e. revB.2) transfer
Incrementing burst transfer
End-of-Burst
2.4.8 wbm_bte_o
The Wishbone revB.3 burst type extension output [wbm_bte_o] gives compliant
slaves additional information about the requested burst. The vga core only supports
linear incrementing bursts. Therefore [wbm_bte_o] is always 2’b00.
2.4.9 wbm_ack_i
When asserted, the acknowledge input [wbm_ack_i] indicates the normal termination
of a valid bus cycle.
2.4.10 wbm_err_i
When asserted, the error input [wbm_err_i] indicates an abnormal termination of a
bus cycle. When the [wbm_err_i] signal is asserted, the core stops the current transfer.
After [wbm_err_i] has been asserted, the state of the core is undefined.
2.5 VGA Port Connections
Port
clk_p_I
hsync_pad_o
Width
1
1
www.opencores.org
Direction
Input
Output
Description
Pixel Clock
Horizontal Synchronization Pulse
Rev 1.2 Preliminary
5 of 40
OpenCores
vsync_pad_o
csync_pad_o
blank_pad_o
r_pad_o
g_pad_o
b_pad_o
1
1
1
8
8
8
Output
Output
Output
Output
Output
Output
3/20/2003
Vertical Synchronization Pulse
Composite Synchronization Pulse
Blank signal
Red Color Data
Green Color Data
Blue Color Data
2.5.1 clk_p_i
All internal video logic is registered to the rising edge of the [clk_p_i] clock input.
The frequency range over which the core can operate depends on the technology used
and the pixel clock needed; [clk_p_i] may not be faster than the WISHBONE clock
[wb_clk_i].
2.5.2 hsync_pad_o
The horizontal synchronization pulse is asserted when the raster scan ray needs to
return to the start position (the left side of the screen).
2.5.3 vsync_pad_o
The vertical synchronization pulse is asserted when the raster scan ray needs to return
to the vertical start position (the top of the screen).
2.5.5 csync_pad_o
The composite synchronization pulse is a combined horizontal and vertical
synchronization signal.
2.5.6 blank_pad_o
The blank output is asserted when no image is projected onto the screen, i.e during the
back porch, the synchronization pulses, and the front porch.
2.5.7 r_pad_o, g_pad_o, b_pad_o
Red, green, and blue pixel data: the RGB lines contain invalid data while the BLANK
signal [blank_pad_o] is asserted.
www.opencores.org
Rev 1.2 Preliminary
6 of 40
OpenCores
3/20/2003
3
Registers
3.1 Registers List
Name
CTRL
STAT
HTIM
VTIM
HVLEN
VBARa
VBARb
C0XY
C0BAR
C0CR
C1XY
C1BAR
C1CR
PCLT
wbs_adr_i[11:0]
0x000
0x004
0x008
0x00C
0x010
0x014
0x018
0x01C-0x02C
0x030
0x034
0x038-0x03C
0x040-0x05C
0x060-0x06C
0x070
0x074
0x078-0x07C
0x080-0x09C
0x0A0-0x7FC
0x800-0xFFC
Width
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
Access
R/W
R/W
R/W
R/W
R/W
R/W
R/W
R/W
R/W
R/W
R/W
R/W
R/W
R/W
R/W
R/W
R/W
R/W
R/W
Description
Control Register
Status Register
Horizontal Timing Register
Vertical Timing Register
Horizontal and Vertical Length Register
Video Memory Base Address Register A
Video Memory Base Address Register B
reserved
Cursor0 X,Y Register
Cursor0 Base Address Register
reserved
Cursor0 Color Registers
reserved
Cursor1 X,Y Register
Cursor1 Base Address Register
reserved
Cursor1 Color Registers
reserved
8bpp Pseudo Color Lookup Table
3.2 Accessing Reserved Address Locations
It is not allowed to access reserved memory locations.
No error is generated when these addresses are accessed; all transfers are terminated
normally. Write accesses are ignored, read accesses return all zeros.
www.opencores.org
Rev 1.2 Preliminary
7 of 40
OpenCores
3/20/2003
3.3 Control Register [CTRL]
Bit #
31:26
25
Access
R/W
R/W
24
R/W
23:22
21
R/W
R/W
20
R/W
19:16
15
R/W
R/W
14
R/W
13
R/W
12
R/W
11
R/W
10,9
R/W
8,7
R/W
6
R/W
5
R/W
4
R/W
Description
reserved
HC1R, Hardware Cursor1 Resolution
0: 32x32 pixel mode
1: 64x64 pixel mode
HC1E, Hardware Cursor1 Enable
0: Hardware Cursor1 disabled
1: Hardware Cursor1 enabled
reserved
HC0R, Hardware Cursor1 Resolution
0: 32x32 pixel mode
1: 64x64 pixel mode
HC0E, Hardware Cursor0 Enable
0: Hardware Cursor0 disabled
1: Hardware Cursor0 enabled
reserved
BL, Blanking Polarization Level
0: Positive
1: Negative
CSL, Composite Synchronization Pulse Polarization Level
0: Positive
1: Negative
VSL, Vertical Synchronization Pulse Polarization Level
0: Positive
1: Negative
HSL, Horizontal Synchronization Pulse Polarization Level
0: Positive
1: Negative
PC, 8-bit Pseudo Color
0: 8-bit grayscale
1: 8-bit pseudo color
CD, Color Depth
11: 32 bits per pixel
VBL, Video memory Burst Length
11b: 8 cycles
10b: 4 cycles
01b: 2 cycles
00b: 1 cycle
CBSWE, CLUT Bank Switching Enable
0: Color lookup table bank switching disabled
1: Color lookup table bank switching enabled
VBSWE, Video Bank Switching Enable
0: Video memory bank switching disabled
1: Video memory bank switching enabled
CBSIE, CLUT Bank Switch Interrupt Enable
www.opencores.org
Rev 1.2 Preliminary
8 of 40
OpenCores
3/20/2003
0: Color lookup table bank switching interrupt disabled
1: Color lookup table bank switching interrupt enabled
3
R/W
VBSIE, Video Bank Switch Interrupt Enable
0: Video memory bank switching interrupt disabled
1: Video memory bank switching interrupt enabled
2
R/W
HIE, HSync Interrupt Enable
0: Horizontal synchronization pulse interrupt disabled
1: Horizontal synchronization pulse interrupt enabled
1
R/W
VIE, VSync Interrupt Enable
0: Vertical synchronization pulse interrupt disabled
1: Vertical synchronization pulse interrupt enabled
0
R/W
VEN, Video Enable
0: Video system disabled
1: Video system enabled
Reset Value: 0x00000000
3.3.1 BL
The Blanking Polarization Level defines the voltage level of the blank output
[blank_pad_o] when the blank signal is asserted. When BL is cleared (‘0’),
[blank_pad_o] is at a high voltage level when the blank signal is asserted and at a low
voltage level when the blank signal is negated (i.e. blank is active high). When BL is
set (‘1’), [blank_pad_o] is at a low voltage level when the blank signal is asserted and
at a high voltage level when the blank signal is negated (i.e. blank is active low).
3.3.2 CBSIE
When the CLUT Bank Switch Interrupt Enable bit is set (‘1’) and a bank switch is
requested, the host is interrupted. The Bank Switch interrupt is independent of the
CLUT Bank Switch Enable bit setting. Setting this bit while the CLUT Bank Switch
Interrupt Pending (CBSINT) flag is set generates an interrupt. Clearing this bit while
CBSINT is set disables the interrupt request, but does not clear the interrupt pending
flag.
3.3.3 CBSWE
When the CLUT Bank Switch Enable bit is set (‘1’) and a complete video frame has
been read into the line buffer, the core switches between the two available color
lookup tables located at the memory addresses that are set in the CLUT Memory Base
Address register. The Active CLUT Memory Page (ACMP) flag reflects the current
active color lookup table. The core automatically clears this bit after the bank switch.
Software should set this bit each time a bank switch is desired.
3.3.4 CD
The Color Depth bits define the number of bits per pixel (bpp): 8, 16, 24, or 32 bits
per pixel.
CD
00b
01b
10b
11b
Color Depth
8bpp
16bpp
24bpp
32bpp
www.opencores.org
Rev 1.2 Preliminary
9 of 40
OpenCores
3/20/2003
3.3.5 CSL
The Composite Sync Polarization Level defines the voltage level of the composite
synchronization output [csync_pad_o] when the composite sync signal is asserted.
When CSL is cleared (‘0’), [csync_pad_o] is at a high voltage level when the
composite sync signal is asserted and at a low voltage level when the composite sync
signal is negated (i.e. csync is active high). When CSL is set (‘1’), [csync_pad_o] is at
a low voltage level when the composite sync signal is asserted and at a high voltage
level when the composite sync signal is negated (i.e. csync is active low).
3.3.6 HC0E
When the Hardware Cursor0 Enable bit is set (‘1’), the first hardware cursor will be
displayed. When it is cleared (‘0’), the hardware cursor will be removed.
To avoid corrupted images, displaying and removing the hardware cursor is
synchronous to the vertical retrace; i.e. the cursor will be displayed/removed in the
next video frame. All related registers should be set to their corresponding values
before enabling the cursor.
3.3.7 HC1E
When the Hardware Cursor1 Enable bit is set (‘1’), the second hardware cursor will
be displayed. When it is cleared (‘0’), the hardware cursor will be removed.
To avoid corrupted images, displaying and removing the hardware cursor is
synchronous to the vertical retrace; i.e. the cursor will be displayed/removed in the
next video frame. All related registers should be set to their corresponding values
before enabling the cursor.
3.3.8 HC0R
The Hardware Cursor0 Resolution bit sets the pattern size and the color depth for the
first hardware cursor. When HC0R is set (‘1’), hardware cursor0 is set for a resolution
of 64x64x4bpp. When HC0R is cleared (‘0’), hardware cursor0 is set for a resolution
of 32x32x16bpp. It may not be changed while the cursor is being displayed. To
change the cursor’s Resolution bit, first turn off the cursor by clearing the Hardware
Cursor0 Enable bit, then change the cursor’s resolution bit value, (re)write the
cursor’s Base Address register to load the new cursor pattern, and finally re-enable the
cursors by setting the Hardware Cursor0 Enable bit. To avoid displaying corrupted
cursors, wait for a vertical sync interrupt after clearing the Hardware Cursor0 Enable
bit.
3.3.9 HC1R
The Hardware Cursor1 Resolution bit sets the pattern size and the color depth for the
second hardware cursor. When HC1R is set (‘1’), hardware cursor1 is set for a
resolution of 64x64x4bpp. When HC1R is cleared (‘0’), hardware cursor1 is set for a
resolution of 32x32x16bpp. It may not be changed while the cursor is being displayed.
To change the cursor’s Resolution bit, first turn off the cursor by clearing the
Hardware Cursor1 Enable bit, then change the cursor’s Resolution bit value, (re)write
the cursor’s Base Address register to load the new cursor pattern, and finally re-enable
the cursors by setting the Hardware Cursor1 Enable bit. To avoid displaying corrupted
cursors, wait for a vertical sync interrupt after clearing the Hardware Cursor1 Enable
bit.
3.3.10 HIE
www.opencores.org
Rev 1.2 Preliminary
10 of 40
OpenCores
3/20/2003
When the Horizontal Interrupt Enable bit is set (‘1’) and a horizontal interrupt is
pending, the host system is interrupted. Setting this bit while the Horizontal Interrupt
Pending (HINT) flag is set generates an interrupt. Clearing this bit while HINT is set
disables the interrupt request but does not clear the interrupt pending flag.
3.3.11 HSL
The Horizontal Sync Polarization Level defines the voltage level of the horizontal
synchronization output [hsync_pad_o] when the horizontal sync signal is asserted.
When HSL is cleared (‘0’), [hsync_pad_o] is at a high voltage level when the
horizontal sync signal is asserted and at a low voltage level when the horizontal sync
signal is negated (i.e. hsync is active high). When HSL is set (‘1’), [hsync_pad_o] is
at a low voltage level when the horizontal sync signal is asserted and at a high voltage
level when the horizontal sync signal is negated (i.e. hsync is active low).
3.3.12 PC
When in 8bpp mode, the pixel data can be used as black and white information (256
grayscales) or as an index to a color lookup table (pseudo color mode). When the PC
bit is set (‘1’), the core operates in pseudo color mode and the pixel data is used to
read the color data from the CLUT. When the PC bit is cleared (‘0’), the pixel-data is
placed on the red, green, and blue outputs, effectively producing a black and white
image with 256 different grayscales.
3.3.13 VBSIE
When the Video Bank Switch Interrupt Enable bit is set (‘1’) and a bank switch is
requested, the host is interrupted. The Bank Switch interrupt is independent of the
Video Bank Switch Enable bit setting. Setting this bit while the Video Bank Switch
Interrupt Pending (VBSINT) flag is set generates an interrupt. Clearing this bit while
VBSINT is set disables the interrupt request but does not clear the interrupt pending
flag.
3.3.14 VBSWE
When the Video Bank Switch Enable bit is set (‘1’) and a complete video frame has
been read into the line buffer, the core switches between the two available video pages
located at the memory addresses set in the Video Memory Base Address (VBAR)
registers. The Active Video Memory Page (AVMP) flag reflects the current active
video page. The core automatically clears this bit after the bank switch. Software
should set this bit each time a bank switch is desired.
3.3.15 VBL
The Video Burst Length bits define the number of transfers during a single block read
access to the video memory: 1 (single access), 2, 4, or 8 accesses per block read. The
core will perform multiple consecutive block reads; the total number of accesses
during a read is therefore always a multiple (i.e. one or more) of the Video Burst
Length.
VBL
00b
01b
10b
11b
Burst length
1 transfer
2 transfers
4 transfers
8 transfers
www.opencores.org
Rev 1.2 Preliminary
11 of 40
OpenCores
3/20/2003
3.3.16 VEN
The video circuit is disabled when the Video Enable bit is cleared (‘0’). The video
circuit is enabled when the Video Enable bit is set (‘1’). This bit must be cleared
before changing any register contents. After (re)programming all registers, this bit
may be set.
3.3.17 VIE
When the Vertical Interrupt Enable bit is set (‘1’) and a vertical interrupt is pending,
the host system is interrupted. Setting this bit while the Vertical Interrupt Pending
(VINT) flag is set generates an interrupt. Clearing this bit while VINT is set disables
the interrupt request but does not clear the interrupt pending flag.
3.3.18 VSL
The Vertical Sync Polarization Level defines the voltage level of the vertical
synchronization output [vsync_pad_o] when the vertical sync signal is asserted. When
VSL is cleared (‘0’), [vsync_pad_o] is at a high voltage level when the vertical sync
signal is asserted and at a low voltage level when the vertical sync signal is negated
(i.e. vsync is active high). When VSL is set (‘1’), [vsync_pad_o] is at a low voltage
level when the vertical sync signal is asserted and at a high voltage level when the
vertical sync signal is negated (i.e. vsync is active low).
www.opencores.org
Rev 1.2 Preliminary
12 of 40
OpenCores
3/20/2003
3.4 Status Register [STAT]
Bit # Access
Description
31:25 R
reserved
24
R
HC1A, Hardware cursor1 available
23:21 R
reserved
20
R
HC0A, Hardware cursor0 available
19:18 R
reserved
17
R
ACMP, Active CLUT Memory Page
16
R
AVMP, Active Video Memory Page
15:8
R
reserved
7
R/W
CBSINT, CLUT Bank Switch Interrupt Pending
6
R/W
VBSINT, Bank Switch Interrupt Pending
5
R/W
HINT, Horizontal Interrupt Pending
4
R/W
VINT, Vertical Interrupt Pending
3:2
R/W
reserved
1
R/W
LUINT, Line FIFO Under-Run Interrupt Pending
0
R/W
SINT, System Error Interrupt Pending
Reset Value: 0x00000000 ~ 0x00110000
3.4.1 ACMP
The Active CLUT Memory Page flag is cleared (‘0’) when the active color lookup
table is CLUT0; it is set (‘1’) when the active color lookup table is CLUT1. This flag
is cleared when the Video Enable bit is cleared. Refer to the CLUT Base Address
register for more information on CLUT0 and CLUT1.
3.4.2 AVMP
The Active Video Memory Page flag is cleared (‘0’) when the active memory page is
located at Video Base Address A (VBARa); it is set (‘1’) when the active memory
page is located at Video Base Address B (VBARb). This flag is cleared when the
Video Enable bit is cleared.
3.4.3 CBSINT
The CLUT Bank Switch Interrupt Pending flag is set (‘1’) when all video data from
the current active memory page has been translated into pixel colors by the currently
active color lookup table. When the CBSIE bit is set (‘1’) and CBSINT is asserted, the
host system is interrupted. Software must clear the interrupt by writing a (‘0’) to this
bit.
3.4.4 HC0A
The Hardware Cursor0 Available bit is a hard coded flag that is set (‘1’) when
Hardware Cursor0 is available and cleared (‘0’) when Hardware Cursor0 is not
available.
3.4.5 HC1A
The Hardware Cursor1 Available bit is a hard coded flag that is set (‘1’) when
Hardware Cursor1 is available and cleared (‘0’) when Hardware Cursor1 is not
available.
www.opencores.org
Rev 1.2 Preliminary
13 of 40
OpenCores
3/20/2003
3.4.6 HINT
The Horizontal Interrupt Pending flag is set (‘1’) when the horizontal synchronization
pulse [hsync_pad_o] is asserted. When the HIE bit is set (‘1’) and HINT is asserted,
the host system is interrupted. Software must clear the interrupt by writing a (‘0’) to
this bit.
3.4.7 LUINT
The Line FIFO Under-Run Interrupt Pending flag is set (‘1’) when pixels are read
from the Line FIFO while it is empty. This can be caused by a locked bus, reading
from an illegal video memory address, or to few entries in the FIFO. When LUINT is
asserted, the host system is interrupted. Software must clear the interrupt by writing a
(‘0’) to this bit.
The Line FIFO Under-Run Interrupt is a non-maskable interrupt.
3.4.8 SINT
The System Error Interrupt Pending flag is set (‘1’) when [wbm_err_i] is asserted
during a read from the video memory. When SINT is asserted, the host system is
interrupted. Software must clear the interrupt by writing a (‘0’) to this bit.
The System Error Interrupt is a non-maskable interrupt.
3.4.9 VBSINT
The Video Bank Switch Interrupt Pending flag is set (‘1’) when all video data from
the current active memory page has been read. When the VBSIE bit is set (‘1’) and
VBSINT is asserted, the host system is interrupted. Software must clear the interrupt
by writing a (‘0’) to this bit.
3.4.10 VINT
The Vertical Interrupt Pending flag is set (‘1’) when the vertical synchronization pulse
[vsync_pad_o] is asserted. When the VIE bit is set (‘1’) and VINT is asserted, the host
system is interrupted. Software must clear the interrupt by writing a (‘0’) to this bit.
3.5 Horizontal Timing Register [HTIM]
Bit # Access
Description
31:24 R/W
Thsync, Horizontal synchronization pulse width
23:16 R/W
Thgdel, Horizontal gate delay time
15:0
R/W
Thgate, Horizontal gate time
3.5.1 Thsync
The horizontal synchronization pulse width, measured in pixels -1.
Example: Thsync = 5 hsync length = 6 pixels
3.5.2 Thgdel
The horizontal gate delay width, measured in pixels -1.
Example: Thgdel = 12 gate delay = 13 pixels
www.opencores.org
Rev 1.2 Preliminary
14 of 40
OpenCores
3/20/2003
3.5.3 Thgate
The horizontal gate width, measured in pixels -1.
Example: Thgate = 799 gate length = 800 pixels
The horizontal gate width is dependent on the programmed Video memory Burst
Length [VBL] and the Color Depth [CD]. It must be divisible by the burst length and
the number of pixels per memory access; see the table below for more information.
CD
00b
01b
10b
11b
(Thgate +1) dividable by:
4 ∗ VBL
2 ∗ VBL
4
∗ VBL
3
1 ∗ VBL
3.6 Vertical Timing Register [VTIM]
Bit # Access
Description
31:24 R/W
Tvsync, vertical synchronization pulse width
23:16 R/W
Tvgdel, vertical gate delay time
15:0
R/W
Tvgate, vertical gate time
3.6.1 Tvsync
The vertical synchronization pulse width, measured in horizontal lines -1.
Example: Tvsync = 5 vsync length = 6 lines
3.6.2 Tvgdel
The vertical gate delay time, measured in horizontal lines -1.
Example: Tvgdel = 2 gate delay = 3 lines
3.6.3 Tvgate
The vertical gate width, measured in horizontal lines -1.
Example: Tgate = 479 gate length = 480 lines
3.7 Horizontal and Vertical Length Register [HVLEN]
Bit # Access
Description
31:16 R/W
Thlen, horizontal length
15:0
R/W
Tvlen, vertical length
3.7.1 Thlen
The total horizontal line time, measured in pixels –1.
Example: Thlen = 1023 line length = 1024 pixels
www.opencores.org
Rev 1.2 Preliminary
15 of 40
OpenCores
3/20/2003
3.7.2 Tvlen
The total vertical frame time, measured in horizontal lines -1.
Example: Tvlen = 599 frame length = 600 lines
3.8 Video Base Address [VBARa] [VBARb]
Bit # Access
Description
31:2
R/W
VBA, Video Base Address
1:0
R
Always zero
3.8.1 Video Base Address
The Video Base Address register defines the starting point of the video memory. The
image is stored in consecutive memory locations, starting at this address. The byte
memory location of a pixel can be calculated as follows:
Adr = ((Y * Thgate) + X) * bytes_per_pixel;
The core supports memories with burst capabilities. Burst transfers of 1, 2, 4, and 8
accesses are supported. The lower address bits must reflect the value entered in the
Video Memory Burst Length bits as shown in the table below, where an ‘x’ represents
a don’t care value.
VBL
00b
01b
10b
11b
VBAR[4:0]*
xxx00b
xx000b
x0000b
00000b
www.opencores.org
Rev 1.2 Preliminary
16 of 40
OpenCores
3/20/2003
3.9 Hardware Cursor Base Address [C0BAR] [C1BAR]
Bit # Access
Description
31:10 R/W
CBA, Cursor Base Address
9:0
R
Always zero
3.9.1 Cursor Base Address
The Cursor Base Address register defines the starting point of the cursor pattern to
use. The cursor pattern is stored in consecutive memory locations, starting at this
address.
3.10 Hardware Cursor (X,Y) Register [C0XY] [C1XY]
Bit # Access
Description
31:16 R/W
CY, Cursor Y location
15:0
R/W
CX, Cursor X location
3.10.1 CY
The cursor’s upper left pixel’s vertical position related to the upper left corner of the
image. CY is always positive, i.e. a larger value means moving the cursor down the
screen. A smaller value means moving the cursor up the screen.
3.10.2 CX
The cursor’s upper left pixel’s horizontal position related to the upper left corner of
the image. CX is always positive, i.e. a larger value means moving the cursor to the
right of the screen. A smaller value means moving the cursor to the left of the screen.
3.11 Hardware Cursor Color Registers [C0CR] [C1CR]
Bit # Access
Description
31:16 R/W
Color data (odd numbered color register)
15:0
R/W
Color data (even numbered color register)
3.11.1 Cursor Color Register
The Cursor Color registers define the cursor colors for 64x64x4bpp cursor mode,
which is enabled when the Hardware Cursor Resolution bit is set (‘1’). In this mode
each cursor pixel uses 4bits. The 4bits are used in a lookup table fashion to select a
single color register from a total of 16. The 16 color registers are mapped to 8
addresses, where the 16LSBs store an even-numbered color register (i.e. 0, 2, 4, etc)
and the 16MSBs store an odd-numbered color register (i.e. 1, 3, 5, etc).
Address
Cursor0
0x028
0x02c
www.opencores.org
Address
Cursor1
0x058
0x05C
Bit 31:16
Color Register 1
Color Register 3
Rev 1.2 Preliminary
Bit 15:0
Color Register 0
Color Register 2
17 of 40
OpenCores
0x030
0x060
0x034
0x064
0x038
0x068
0x03C
0x06C
0x040
0x070
0x044
0x074
Reset Value: undefined
Color Register 5
Color Register 7
Color Register 9
Color Register 11
Color Register 13
Color Register 15
3/20/2003
Color Register 4
Color Register 6
Color Register 8
Color Register 10
Color Register 12
Color Register 14
These registers are available only when the dedicated hardware cursor is
implemented, i.e. C0CR is available when hardware cursor0 is available, and C1CR is
available when hardware cursor1 is available. Whether or not a hardware cursor is
implemented can be checked via the Status register. When a hardware cursor is not
implemented the memory locations are reserved and the rules for accessing reserved
memory locations apply.
Note: The contents of these registers is undefined after a reset.
3.12 8bpp Pseudo Color Lookup Table [PCLT]
3.12.1 Color Lookup Table
The color lookup table is mapped into the core’s address range. It can be accessed
(read and write) via the WISHBONE Slave interface, starting at address 0x800. See
section 4.2.5 Color Lookup Table for more information.
www.opencores.org
Rev 1.2 Preliminary
18 of 40
OpenCores
3/20/2003
4
Operation
4.1 Video Timing
4.1.1 Horizontal Video Timing
Thsync
Thgdel
Thgate
Thlen
4.1.1.1 Thsync
The Horizontal Synchronization Time is the duration of the horizontal
synchronization pulse, measured in pixel clock ticks.
4.1.1.2 Thgdel
The Horizontal Gate Delay Time is the duration of the time between the end of the
horizontal synchronization pulse and the start of the horizontal gate, measured in pixel
clock ticks. The image can be shifted left/right over the screen by modifying Thgdel.
In video timing diagrams, this is mostly referred to as the back porch.
4.1.1.3 Thgate
The Horizontal Gate Time is the duration of the visible area of a video line, measured
in pixel clock ticks. In video timing diagrams, this is mostly referred to as the active
time.
4.1.1.4 Thlen
The Horizontal Length Time is the duration of a complete video line, from the start of
the horizontal synchronization pulse till the start of the next horizontal
synchronization pulse, measured in pixel clock ticks.
www.opencores.org
Rev 1.2 Preliminary
19 of 40
OpenCores
3/20/2003
4.1.2 Vertical Video Timing
Tvsync
Tvgdel
Tvgate
Tvlen
4.1.2.1 Tvsync
The Vertical Synchronization Time is the duration of the vertical synchronization
pulse, measured in horizontal lines.
4.1.2.2 Tvgdel
The Vertical Gate Delay Time is the duration of the time between the end of the
vertical synchronization pulse and the start of the vertical gate, measured in horizontal
lines. The image can be shifted up/down the screen by modifying Tvgdel. In video
timing diagrams, this is mostly referred to as the back porch.
4.1.2.3 Tvgate
The Vertical Gate Time is the duration of the visible area of a video frame, measured
in horizontal lines. In video timing diagrams, this is mostly referred to as the active
time.
4.1.2.4 Tvlen
The Vertical Length Time is the duration of a complete video frame, from the start of
the vertical synchronization pulse till the start of the next vertical synchronization
pulse, measured in horizontal lines.
www.opencores.org
Rev 1.2 Preliminary
20 of 40
OpenCores
3/20/2003
4.1.3 Combined Video Frame Timing
Thsync
Thgdel
Thgate
Tvgdel
Tvsync
Thlen
Total vertical image size
Tvlen
Tvgate
Pixel (0,0)
Visible Area
Total horizontal image size
The video frame is composed of Tvlen video lines, each Thlen pixels long. The
logical AND function of the horizontal gate and the vertical gate defines the visible
area, the rest of the image is blanked.
www.opencores.org
Rev 1.2 Preliminary
21 of 40
OpenCores
3/20/2003
4.2 Pixel Color Generation
4.2.1 Color Processor Internals
ADR_O
DAT_I
Address
Generator
Data
Buffer
Colorizer
block
RGB
To Cursor Processor or Line FIFO
CLUT
The Color Processor, together with the WISHBONE Master interface and the Line
FIFO, handles the pixel color generation. The internal structure of the Color
Processor, including parts of the WISHBONE Master interface, is shown in the figure
above.
4.2.2 Address Generator
The address generator is part of the WISHBONE Master interface. It generates the
video memory addresses, performs video memory bank switching, and keeps track of
the number of pixels to read. When all pixels are read, the video memory bank is
switched, the video memory offset (i.e. the pixel counter) is reset and - when enabled
- the bank switch interrupt is generated. The bank switch interrupt is only dependent
on the amount of pixels read, i.e. it has no fixed timing relation to the horizontal or
vertical synchronization pulses.
4.2.3 Data Buffer
The data buffer temporarily stores the data read from the video memory. It can
contain 16 32-bit entries. The system tries to keep the data buffer at least half full. The
data is read from the video memory by a consecutive address burst; i.e. [wbm_cab_o]
is asserted. The burst length is determined by the Video memory Burst Length [VBL]
bits in the control registers. It is possible that multiple burst accesses are executed
within a single access cycle.
All data is stored consecutively, and all available bits are used independent of color
depth. In 8bpp mode, a 32-bit word stores 4 pixels; in 16bpp mode it stores 2 pixels,
in 24bpp mode 1 1/3 pixels, and in 32bpp 1 pixel.
4.2.4 Colorizer
The colorizer translates the data stored in the data buffer into colors (see the examples
below).
The table below shows the Data Buffer contents used in the examples. Only 8 out of
the 16 possible entries are shown. The buffer is read from the top to the bottom, i.e.
0x01234567 is the first data read, and 0x89abcdef is the second etc.
www.opencores.org
Rev 1.2 Preliminary
22 of 40
OpenCores
3/20/2003
Data Buffer contents
0x01234567
0x89abcdef
0x01234567
0x89abcdef
0x01234567
0x89abcdef
0x01234567
0x89abcdef
4.2.4.1 32bpp example.
In 32-bits-per-pixel mode, the lower 24 bits carry the pixel data. The upper 8 bits are
ignored, they can be used for Z-buffer, alpha channel, stencil buffer, or similar
purposes.
The table below shows the RGB values generated from the sample data in the Data
Buffer. Only the first 4 pixels are shown.
Color Data
0x01234567
0x89abcdef
0x01234567
0x89abcdef
R
0x23
0xab
0x23
0xab
G
0x45
0xcd
0x45
0xcd
B
0x67
0xef
0x67
0xef
In 24-bits-per-pixel mode, the RGB values are generated as shown in the following
sequence: Da(31:8), Da(7:0)Db(31:16), Db(15:0)Dc(31:24), Dc(23:0).
Buffer.
Color Data
0x12345
0x6789ab
0xcdef01
0x234567
R
0x01
0x67
0xcd
0x23
G
0x23
0x89
0xef
0x45
B
0x45
0xab
0x01
0x67
4.2.4.3 TrippleDisplay mode
The system is capable of driving up to three different displays at the same time. The
system operates in TrippleDisplay mode when it is setup for 24bpp mode, but each of
the three colors contains grayscale information for a single display.
www.opencores.org
Rev 1.2 Preliminary
23 of 40
OpenCores
3/20/2003
In 16-bits-per-pixel mode, the upper 16bits carry the data for the first pixel and the
lower 16 bits carry the data for the second pixel. The 24-bit RGB data is extracted
from the 16-bit color data as follows:
R(7:0) = color_data(15:11), 000b
G(7:0) = color_data(10:5), 00b
B(7:0) = color_data(4:0), 000b
Color Data
0x0123
0x4567
0x89ab
0xcdef
R
0x00
0x40
0x88
0xc8
G
0x24
0xac
0x34
0xbc
B
0x18
0x38
0x58
0x78
4.2.4.5 8bpp grayscale example.
In 8-bits-per-pixel grayscale mode, the color data for each of the three colors are
equal. The information stored in one byte is sent to all three colors, effectively
producing a black-and-white image with 256 grayscales.
Color Data
0x01
0x23
0x45
0x67
R
0x01
0x23
0x45
0x67
G
0x01
0x23
0x45
0x67
B
0x01
0x23
0x45
0x67
4.2.4.6 8bpp pseudo-color example.
In 8-bits-per-pixel pseudo-color mode, the color data represents an offset in the
internal color lookup table (CLUT). The CLUT contains the RGB color information.
This way it is possible to generate an image with 256 different colors with minimal
memory requirements.
R = clut_data_out(23:16)
G = clut_data_out(15:8)
B = clut_data_out(7:0)
The table below shows the CLUT addresses for the first 4 pixels.
Color Data
0x01
0x23
0x45
0x67
www.opencores.org
CLUT offset
0x01
0x23
0x45
0x67
Rev 1.2 Preliminary
24 of 40
OpenCores
3/20/2003
4.2.5 Color Lookup Table
The color lookup table (or CLUT) is a 512x24 bit single-clock synchronous static
random access memory divided into two separate CLUTs, of 256x24 bit each. Either
one of them is accessed by the colorizer, depending on the Active CLUT Memory
Page [ACMP] flag in the Status register. When the ACMP flag is cleared (‘0’),
CLUT0 is accessed. When the ACMP flag is set (‘1’), CLUT1 is accessed.
The CLUT memory is mapped into the core’s address range. It can be externally
accessed (read and write) via the WISHBONE Slave interface, starting at address
0x800. CLUT0 is located at memory range 0x800 – 0xBFC, CLUT1 at 0xC00 –
0xFFC. All external accesses to the CLUT are 32-bit, but the CLUT itself is only 24
bit wide. The top-most bits[31:24] are ignored for write accesses and are always zero
for read accesses.
www.opencores.org
Rev 1.2 Preliminary
25 of 40
OpenCores
3/20/2003
4.3 Hardware Cursors
4.3.1 Introduction
The Enhanced VGA/LCD Core provides up to two hardware cursors. If and which of
the two cursors are implemented is dependent on the system designer. The core takes
two definition-parameters (VGA_HWC0 and VGA_HWC1) as input. The define
statements are located in the “vga_defines.v” file. If both definition parameters are
undefined, no logic is generated for the hardware cursors. If a definition parameter is
defined, logic for the appropriate cursor is generated.
Cursor0 is normally used to provide the arrow pointer in GUI applications and
operating systems. Cursor1 has no pre-assigned purpose; it can be used to provide
some form of user cursor in a pop-up window.
Off-screen memory in the frame buffer or, if accessible by the core, system memory is
used to provide the locations where the patterns for both cursors are stored. This
allows each cursor to be displayed and used without altering the main display image
stored in the frame buffer. The hardware takes care of selecting between the cursor
and the image. The Cursor Base Address register determines the cursor’s pattern
location. Each cursor may have multiple patterns stored in memory, making it
possible to change each cursor’s appearance by switching from one pattern to another
by simply changing the appropriate Base Address register.
4.3.2 Cursor Patterns
The amount of memory allocated for each cursor pattern is 16Kbit. The cursor
resolutions are user-selectable, either 32x32 pixels and 16bpp color depth, or 64x64
pixels and 4bpp color depth. The cursor pattern is stored in consecutive memory
locations, starting at the address set by the cursor’s Base Address register. Each
address location contains data for multiple cursor pixels: 2 pixels in 32x32 pixel mode
and 8 pixels in 64x64 pixel mode.
4.3.2.1 32x32 Pixel Mode
In 32x32 pixel mode, each pixel has a 16-bit color depth, divided into a selection bit
and 15-bit cursor colors (32Kcolors) or an 8-bit alpha channel. The MSB selects
between cursor color mode and alpha channel mode. The alpha channel is used to
generate transparent pixels or 3D effects (see Pattern Color Data).
4.3.2.2 64x64 Pixel Mode
In 64x64 pixel mode, each pixel has a 4-bit color depth. The 4 bits are used in a
lookup table fashion to select a Color register from the available 16 Cursor Color
registers. Each Color register contains a 16-bit value, that has the same features as the
cursor pattern data in the 32x32 pixel mode, i.e. 1 selection bit and 15 color bits or an
8-bit alpha channel.
4.3.2.3 Cursor Pixel Data
Each cursor pixel is represented by a 16-bit color value. The MSB selects between
color mode and Alpha/Transparency mode.
www.opencores.org
Rev 1.2 Preliminary
26 of 40
OpenCores
bit 15
0
1
bit 14:8
bit 7:0
Color Data
always zero Alpha Data
3/20/2003
Color mode
Alpha / Transparency mode
In color mode the LSBs represent a 15-bit RGB value, resulting in 32K colors. The
32K cursor colors are generated by equally distributing the color information over the
RGB components, i.e. 5 bits for red, 5 bits for green, and 5 bits for blue. Internally the
5 bit R, G, and B values are extended to 8 bits, the lower 3 bits for each color are set
to zero.
In Alpha/Transparency mode, the LSBs are divided into two sections. The first
section (bits 14:8) is reserved and should always be read and written as zero. The
second section (bits 7:0) represents an 8-bit alpha value. The alpha value is a crossfader setting between the image pixel value and the black level (RGB = 0). Alpha is
normally defined as a value between 0 and 1, where 0 = ‘00’hex and 1 = ‘FF’hex.
Setting the Alpha value to 0 results in the black level being displayed. Setting the
Alpha value to 1 results in the image pixel being displayed. Any value between 0 and
1 results in a linear mix between the image pixel value and black. This can be used to
add the effect of shadow to a cursor, thus creating 3D cursors.
The image below shows how to create the 3D cursor from the Redwood scheme.
(0,0)
(0,31) (0,63)
Transparent, 0 < Alpha < 1
Gray
White
Black
Transparent, Alpha = 1
(31,0) (63,0)
(31,31) (63,63)
4.3.3 Turning off 3D support.
The alpha-blending logic requires quite an amount of resources. Therefore, the ability
to turn off the 3D support has been provided. When 3D support is turned off, the
cursor processor ignores the alpha data and generates a transparent pixel. Instead of a
shadow effect, the image pixel is displayed. This behavior guarantees that the 3D and
non-3D cursors are displayed correctly.
3D cursor support is enabled when VGA_HWC_3D is defined.
3D cursor support is disabled when VGA_HWC_3D is undefined.
www.opencores.org
Rev 1.2 Preliminary
27 of 40
OpenCores
3/20/2003
4.3.4 Cursor Processor Internals
Address
Generator
ADR O
DAT I
Cursor
Buffer
RGB
From Color Processor
Cursor
Buffer
Cursor1
Processor
Cursor0
Processor
RGB
To Line FIFO
Cursor Processor
The Cursor Processor handles the hardware cursors together with the WISHBONE
Master interface. The internal structure of the Cursor Processor, including parts of the
WISHBONE Master interface is shown in the figure above. If a cursor is not
implemented, it is a pass-through function. The above schematic still applies, but no
logic is generated for that cursor.
4.3.5 Address Generator
The address generator is part of the WISHBONE Master interface. When copying a
cursor into one of the cursor buffers, it generates the memory addresses and writes the
data read into the buffers. The cursor processors issue a cursor read request to the
address generator when their corresponding Cursor Base address [C0BAR][C1BAR]
is written to. When the WISHBONE Master finishes reading the current video frame
it honors one cursor read request. The cursor data is read in one continuous stream
before the start of the next frame.
If both cursors need to be reloaded, one is reloaded before the next frame. It’s cursor
read request is negated. The second cursor read request is not honored; it remains
asserted. When the WISHBONE Master finishes reading the new frame, it honors the
second cursor read request.
Cursor0 has a higher priority than Cursor1. When both cursors need to be reloaded,
Cursor0 is reloaded first. This implies that continuously reloading Cursor0 results in
Cursor1 never being reloaded. However, this situation should never occur during
normal operation.
4.3.6 Cursor Buffer
The cursor buffers are 512x32 bit synchronous static random access memories. The
address generator writes a copy of the cursor pattern into the cursor buffer whenever
the cursor base address [C0BAR][C1BAR] is written to.
www.opencores.org
Rev 1.2 Preliminary
28 of 40
OpenCores
3/20/2003
4.3.7 Cursor0/Cursor1 Processor
The two cursor processors are the intelligent part of the cursor system. Each cursor
processor handles 1 cursor. It keeps track of the raster-scan position, determines
whether or not the cursor pattern should be updated, whether or not the cursor should
be displayed, and generates the cursor colors including the alpha mixing.
www.opencores.org
Rev 1.2 Preliminary
29 of 40
OpenCores
3/20/2003
4.4 Bank switching
4.4.1 Introduction
The bank switching system is implemented as a double buffering scheme, also known
as a Ping-Pong system. The core reads pixel information from one memory bank
while the second bank is being filled. When the second bank has been filled, the host
sets the Video Bank Switch Enable bit [VBSE] and/or the Color Lookup Table Bank
Switch Enable bit [CBSE]. The core finishes reading the current bank until the entire
frame has been read. It then switches to the second bank and starts reading the new
frame. The core automatically resets the VBSE and CBSE bits to avoid accidentally
switching to the previous bank. A Video Bank Switch Interrupt is generated when the
core switches between the two video memory banks, and a CLUT Bank Switch
Interrupt is generated when the core switches between the two Color Lookup Tables.
4.4.2 Host notes
The host should not set the VBSE or CBSE bits until all frame information has been
written to the video memory. The host system should wait for the Bank Switch
Interrupt before filling the previous memory bank.
4.4.3 Sequence
1) Fill video bank0.
3) Set VBSE, CBSE, BSIE.
4) Wait for interrupt.
6) Set VBSE, CBSE.
7) Wait for interrupt.
9) Set VBSE, CBSE.
10) Go to step 4.
www.opencores.org
Rev 1.2 Preliminary
30 of 40
OpenCores
3/20/2003
4.5 Bandwidth Issues
4.5.1 Introduction
Video displays are real-time devices. The video data stream needs to be generated
uninterrupted, or images will be corrupted. The VGA_LCD core provides some
flexibility through the use of internal FIFOS, including the large dual-clocked LineFIFO. But still the average bandwidth required by the video must be met.
4.5.2 Calculations
The required video bandwidth can be calculated using the following formula:
BWvideo = Hpix * Vlin * Frefr ( pps )
Hpix = number _ of _ visible _ horizontal _ pixels (Thgate )
Vlin = number _ of _ visible _ vertical _ lines (Tvgate )
Frefr = refresh _ rate ( Hz )
For example, a standard VGA display with 640*480 visible pixels and a refresh rate
of 60Hz requires a bandwidth of BW = 640 * 480 * 60 = 18.4 Mpixels_per_sec
(Mpps). A SVGA display with 1024*768 pixels and a 75Hz refresh rate requires
59Mpps. Note that this number also represents the pixel-clock frequency, because
only 1 pixels is displayed at a time.
The required host bus bandwidth is dependent on the required number of bits per
pixel, as shown in the next formula:
BWrequired = BWvideo ∗ N bits _ per _ pixel (bps )
Using the previous examples we can calculate the following table:
Color depth 640*480 @60Hz
32bpp
590Mbps
24bpp
443Mbps
16bpp
295Mbps
8bpp
147Mbps
1024*768 @75Hz
1.9Gbps
1.4Gbps
944Mbps
472Mbps
The host bus occupation is dependent on the total host bus bandwidth, the initial
memory latency, the memory access/acknowledge latency, and the programmed video
burst length. It can be calculated as follows:
Obus =
BWrequired
BWbus
* 100%
BWbus = host _ bus _ bandwidth ( Mbps )
Or more detailed:
www.opencores.org
Rev 1.2 Preliminary
31 of 40
OpenCores
Obus =
BWrequired
Fbus ∗ N bus
∗
(
Mlat initial +VBL∗ Mlat acc
VBL
3/20/2003
)*100%
Fbus = host _ bus _ frequency ( Hz )
N bus = host _ bus _ width (bits )
Mlat initial = initial _ video _ memory _ latency (clk _ cycles)
Mlat acc = video _ memory _ access _ latency (clk _ cycles)
VBL = Video _ Burst _ Length
4.5.3 Examples
4.5.3.1 Example 1
Assume the following system: 200MHz, 32-bit host system using SDRAMS as video
memory, running at half the bus frequency, displaying a 1024*768 image @75Hz
24bpp.
Fbus = 200MHz
Nbus = 32-bit
BWrequired = 1.4Gbps
Mlat(initial) = 6 (2* CAS-latency of 3)
Mlat(acc) = 2 (single cycle bursts at half the bus frequency)
Video_burst_length = 4
Total host bus occupation = 77.4%
4.5.3.2 Example 2
Assume a system with an average memory bandwidth of 250MBps displaying an
800*600 image @60Hz 16bpp.
BWrequired = 461Mbps
BWbus = 2Gbps
Total host bus occupation = 23%
4.5.3.3 Example 3
Assume the following system: 30MHz, 32-bit host system using SRAMS as video
memory, displaying a 320*240 image @60Hz 8bpp.
Fbus = 30MHz
Nbus = 32-bit
BWrequired = 37Mbps
Mlat(initial) = 1 (access selector)
Mlat(acc) = 2 (address setup)
Video_burst_length = 8
Total host bus occupation = 8.2%
Note that these numbers are for reading only. The video memory needs to be filled in
order to be able to display something. Depending on the application, filling the video
memory can require a considerable amount of bandwidth too.
www.opencores.org
Rev 1.2 Preliminary
32 of 40
OpenCores
3/20/2003
5
Architecture
DAC clock
LCD clock
From Host
WISHBONE
SLAVE
Interface
Timing
Registers
Video
Timing
Generator
Control Register
HSYNC
VSYNC
CSYNC
BLANK
Status Register
Video Memory
Base Registers
Cursor Base
Registers
Cursor (x,y)
Registers
wb_inta_o
To Video
memory
WISHBONE
MASTER
Interface
Color Lookup
Table
Cursor
Buffers
Color
Processor
Cursor
Processor
Line
FIFO
5.1 Color Lookup Table
The Color Lookup Table (or CLUT) is a 512x24 bit single clock synchronous static
random access memory divided into two separate CLUTs of 256x24 bit each. Each
color lookup table contains a 24-bit RGB value for each entry. The color processor
www.opencores.org
Rev 1.2 Preliminary
33 of 40
R(7:0)
G(7:0)
B(7:0)
OpenCores
3/20/2003
uses 8bpp pseudo color data as an address input to the color lookup table. The output
from the color lookup table is the RGB data for the current pixel.
5.2 Cursor Base Registers
The Cursor Base registers contain the starting address of the current cursors. Each
cursor is 32x32 pixels large. Each pixel is always in 16bpp color mode. Therefore,
512 address locations are required to store a single cursor. A cursor is stored
consecutively, starting at pixel (0,0) representing the upper left corner of the cursor,
then continuing to pixels (0,1), (0,31), (1,0), and (1,31) etc. A cursor can be located
anywhere in memory as long as the memory is accessible by the VGA_LCD core and
it starts at a cursor boundary, i.e. the lower 10 address bits must be zero.
5.2 Cursor Buffers
The cursor buffers are 512x32 bit single clock synchronous static memories. Each
buffer contains a copy of the current cursor pattern. The core reads the cursor patterns
from the external memory and stores them in the cursor buffers, thus avoiding having
to read it every frame. The core copies a cursor pattern whenever the Cursor Base
Address register is written to. This also opens the possibility to display another cursor
than is actually stored in the external memory. Simply rewriting the same address to
the Cursor Base Address register is enough to read the new cursor data and display the
new cursor.
5.3 Cursor Processor
The cursor processor translates the stored cursor pattern into a visible cursor. It
manages the cursor location and determines the pixel information for the current pixel
- being image or cursor - including cursor transparency and alpha blending.
5.4 Color Processor
The Color Processor translates the received pixel data to RGB color information.
When in 32-bit and 24-bit color mode, this is a pass-through function. In 16-bit color
mode this is a linear translation: 5-bit Red, 6-bit Green, and 5-bit Blue. When in 8-bit
grayscale mode the same data is placed on the red, green, and blue color outputs,
effectively generating a black-and-white image. When in 8-bit pseudo color mode the
received pixel data is sent through the internal color lookup table.
5.5 Line FIFO
The dual-clocked Line FIFO ensures a continuous data stream towards the VGA or
LCD display and ensures a correct transformation from the WISHBONE clock
domain to the VGA clock domain.
5.6 Video Memory Base Registers
The Video Memory Base registers contain the starting addresses of the external video
memory banks.
5.7 Video Timing Generator
The Video Timing Generator generates the horizontal synchronization pulse
[hsync_pad_o], the vertical synchronization pulse [vsync_pad_o], the corresponding
interrupt signals [HINT] and [VINT], the composite synchronization pulse
www.opencores.org
Rev 1.2 Preliminary
34 of 40
OpenCores
3/20/2003
[csync_pad_o], the blanking signal [blank_pad_o] and the read request to the Line
FIFO.
5.8 Wishbone Master Interface
The WISHBONE Master interface manages all accesses to the external memory. It
consists of a number of interacting state machines. The color processor and the cursor
processor issue requests to the WISHBONE Master. The WISHBONE Master
interface then generates the memory addresses for the image and the cursors.
5.9 Wishbone Slave Interface
The WISHBONE Slave interface manages all accesses to user readable/writeable
registers.
www.opencores.org
Rev 1.2 Preliminary
35 of 40
OpenCores
3/20/2003
Appendix A
VGA Modes
This appendix describes some common VGA modes.
A.1 Vertical Timing Information Common VGA Modes
Mode
QVGA
VGA
VGA
SVGA
SVGA
SVGA
•
•
Resolution
320x240
640x480
640x480
800x600
800x600
800x600
Line
Refresh
Width
rate
usec
60 Hz
60 Hz
31.78
72 Hz
26.41
56 Hz
28.44
60 Hz
26.40
72 Hz
20.80
Sync
Pulse
usec
lin
63
79
56
106
125
2
3
1
4
6
Back porch
Active time
Front porch
Frame Total
usec
usec
lin
usec
usec
lin
15382
12782
17177
15945
12563
484
484
604
604
604
16683
13735
17775
16579
13853
525
520
625
628
666
953
686
568
554
436
lin
30
26
20
21
21
285
184
728
lin
9
7
-1*
-1*
35
The Active Time includes 4 overscan borderlines. Some timing tables include
these into the back and front porch.
When the Active Time is increased, it passes the rising edge of the vsync signal,
hence the –1 Front Porch.
A.2 Horizontal Timing Information Common VGA Modes
Mode
QVGA
VGA
VGA
SVGA
SVGA
SVGA
•
Resolution
320x240
640x480
640x480
800x600
800x600
800x600
Refresh
rate
60 Hz
60 Hz
72 Hz
56 Hz
60 Hz
72 Hz
Pixel
Clock
MHz
Sync
Pulse
usec pix
25.175
31.5
36
40
50
3.81
1.27
2
3.2
2.4
96
40
72
128
120
Back porch
Active time
Front porch
Line Total
pix
pix
pix
pix
45
125
125
85
61
646
646
806
806
806
13
21
21
37
53
800
832
1024
1056
1040
The Active Time includes 6 overscan borderlines. Some timing tables include
these into the back and front porch.
Partially taken from Jere Makela, Software Design for a Video Conversion
Equipment. Master’s Thesis, Helsinki Univerity of Technology.
www.opencores.org
Rev 1.2 Preliminary
36 of 40
OpenCores
3/20/2003
Appendix B
Target Dependent
Implementations
The parts of the system that could be target dependent for FPGA implementations and
are absolutely target dependent for ASIC implementations are the dual clocked RAM
block for the Line FIFO as well as the single clock RAM blocks for the color lookup
table and the cursor buffers.
The RAM blocks are instantiated by the generic_spram.v and generic_dpram.v files.
These files contain an FPGA-synthesizable model, that has been tested with
Exemplar’s LeonardoSpectrum and Symplicity’s Synplify for Altera (FLEX, ACEX,
APEX) and Xilinx devices (Virtex, Virtex-E, Spartan-II). They also contain modules
for some ASIC technologies.
The technology is set by a define statement in the vga_defines.v file.
`define VENDOR_FPGA use FPGA (Xilinx and Altera) synthesizable model
`define VENDOR_ARTISAN use Artisan memories
`define VENDOR_VIRTUALSILICON use VirtualSilicon memories
.
.
.
Check the generic_spram.v and generic_dpram.v files for more information.
www.opencores.org
Rev 1.2 Preliminary
37 of 40
OpenCores
3/20/2003
Appendix C
Core Structure
Name
VGA
File
vga_enh_top.v
Name
WISHBONE
Name
CLUT
Name
Line FIFO
File
vga_wb_slave.v
File
vga_csm_pb.v
File
vga_fifo_dc.v
Name
clut_mem
Name
fifo_dc_mem
File
generic_spram.v
File
generic_dpram.v
Name
WISHBONE
Name
Pixel Generator
File
vga_wb_master.v
File
vga_pgen.v
Name
CLUT switch Fifo
Name
Timing Generator
File
vga_fifo.v
File
vga_tgen.v
Name
Data Fifo
File
vga_fifo.v
Name
RGB Fifo
File
vga_fifo.v
Name
Color Processor
File
vga_colproc.v
Name
Cursor Processors
File
vga_curproc.v
Name
Horizontal Timing
Name
Vertical Timing
File
vga_vtim.v
File
vga_vtim.v
Name
SyncPulseCounter
Name
GateDelayCounter
Name
GateCounter
Name
LengthCounter
File
ro_cnt.v
File
ro_cnt.v
File
ro_cnt.v
File
ro_cnt.v
Name
counter
Name
counter
Name
counter
Name
counter
File
ud_cnt.v
File
ud_cnt.v
File
ud_cnt.v
File
ud_cnt.v
www.opencores.org
Rev 1.2 Preliminary
38 of 40
OpenCores
3/20/2003
Appendix D
Design Notes
D.1 Introduction
This section contains flow and timing diagrams of the core’s internal blocks. The
diagrams are provided for reference only. They are intended to provide a better
understanding of the internal signal flow. They are not intended to serve as a detailed
step-through discussion of the core’s internals.
www.opencores.org
Rev 1.2 Preliminary
39 of 40
OpenCores
3/20/2003
D.2 vga_curproc
This section shows the signal flow inside the cursor processor blocks. The letters in
the data busses are intended to ease the data flow overview. They represent signals
that are somehow related to each other and have a common timing spec, for example
cbuf_a-A represents address-A into the cursor-buffer, cbuf_q-A is the cursor buffer’s
output at address-A.
clk
idat wreq
didat wreq
ddidat wreq
inbox signals
xcnt
inbox x
xdone
ycnt
inbox y
inbox
dinbox
ddinbox
ddinbox
A
cursor buffer access signals
cbuf a
cbuf q
A
B
C
B
C
A
cursor 64x64 pixels signals
cc adr
cc dat i
A
r, g, b, alpha
dr, dg, db, dalpha
idat
didat
ddidat
dddidat
C
B
C
A
cursor 32x32 pixels signals
image data
B
B
B
A
A
Y
Z
A
Y
B
Z
Y
C
C
B
C
C
D
A
B
C
D
Z
A
B
C
Y
Z
RGB generation
RGB
A
Y
Z
B
C
A
B
wreq generation
store1
store2
wreq
www.opencores.org
Rev 1.2 Preliminary
40 of 40

ORGFX - a Wishbone compatible Graphics Accelerator for the

Transcription

Similar documents

XMPlayer Express

Why Radio?

The GUIDE

Perfect Pixel Implementation

ADVANTAGE 1.0

Rasterization: Shading a Triangle Gouraud

micropix - Alfalite

Texture

© www.jbonzer.com Graphics DJ Inkers