SIMD Column-Parallel Polygon Rendering


This thesis describes the design and implementation of a polygonal rendering system for a large one dimensional single instruction multiple data (SIMD) array of processors. Polygonal rendering enjoys a traditionally high level of available parallelism, but efficient realization of this parallelism requires communication. On large systems, such as the 1024--processor Princeton Engine studied here, communication overhead limits performance.

Special attention is paid to study of the analytical and experimental scalability of the design alternatives. Sort--last communication is determined to be the only design to offer linear performance improvements in the number of processors. We demonstrate that sort--first communication incurs increasingly high redundancy of calculations with increasing numbers of processors, while sort--middle communication has performance linear in the number of primitives and independent of the number of processors. Performance is thus communication--limited on large sort--first and sort--middle systems.

The system described achieves interactive rendering rates of up to 30 frames per second at NTSC resolution. Scalable solutions such as sort--last proved too expensive to achieve real--time performance. Thus, to maintain interactivity a sort--middle render was implemented. A large set of communication optimizations are analyzed and implemented under the processor--managed communication architecture.

Finally a novel combination of sort--middle and sort--last communication is proposed and analyzed. The algorithm efficiently combines the performance of sort--middle architectures for small numbers of processors and polygons with the scalability of sort--last architectures for large numbers of processors to create a system with speedup linear in the number of processors. The communication structure is substantially more efficient than traditional sort--last architectures both in time and in memory.


Take me back!