www.opengl.org | Direct X | SGI OpenGL | nVidia | ATI 
Pros & Cons 
Research Focus 
Project Log 
Related Links 
RForce Method 
master thesis released
RForce introduced
NV38Box released
BrookBox released
project log updated
project log updated
project log updated
project log updated
project log updated
project log updated
links updated

This page will try to summarize pros and cons of the raytracing method with impact on modern GPUs.
Most parts of this text were inspired in text [1].

GPUs are very powerful

GPUs are very powerful in computations which can be mapped as SIMD algorithms. GPUs provide better performance than todays CPUs, but on the other hand programs are very limited in functionality, complexity and memory access.


Ray tracing allows us to trace individual or unstructured groups of rays. This provides for efficient computation of just the required information, e.g. for sampling narrow glossy highlights, for filling holes in image-based rendering, and for importance sampling of illumination. Eventually this flexibility is required if we want to achieve interactive global illumination simulations based on ray tracing.

Occlusion Culling and Logarithmic Complexity

Ray tracing enables efficient rendering of complex scenes through its built in occlusion culling as well as its logarithmic complexity in the number of scene primitives. Using a simple search data structure it can quickly locate the relevant geometry in a scene and stops its front to back processing as soon as visibility has been determined. This approach to process geometry on demand stands in strong contrast to the ìsend all geometry and discard at the endî approach taken by current triangle rasterization hardware.

Efficient Shading

With ray tracing, samples are only shaded after visibility has been determined. Given the trend toward more and more realistic and complex shading, this avoids redundant computations for invisible geometry.

Simpler Shader Programming

Programming shaders that create special lighting and appearance effects has been at the core of realistic rendering. While writing shaders (e.g. for the RenderMan standard) is fairly straightforward, adopting these shaders to be used in the pipeline model of rasterization has been very difficult. Since ray tracing is not limited to this pipeline model it can make direct use of shaders.


By default ray tracing computes physically correct reflections, refractions, and shading. In case the correct results are not required or are too costly to compute, ray tracing can easily make use of the same approximations used to generate these effects for rasterization-based approaches, such as reflection or environment maps. This is contrary to rasterization, where approximations are the only option and it is difficult to even come close to realistic effects.

Parallel Scalability

Ray tracing is known for being trivially parallel as long as a high enough bandwidth to the scene data is provided. Given the exponential growth of available hardware resources, ray tracing should be better able to utilize it than rasterization, which has been difficult to scale efficiently. However, the initial resources required for a hardware ray tracing engine are higher than those for a rasterization engine.


Coherence is the key to efficient rendering. Due to the low coherence between rays in traditional recursive ray tracing implementations, performance has been rather low.

Random memory access

Simple recursive approach tends to randomized memory querying. Primary rays express quite good degree of coherency, but secondary rays are nearly random in general scenes. Random memory access is very limeted on todays GPUs. This requires more complex algorithms and data structures to reduce this problem.

Coherent algorithms are complex

Current GPU architectures are limited in number instructions per program kernel. Current GPU programs must be light-weight to perform in one pass. If more complex computations are required, program computation must be broken into more kernels which are executed consequentialy as multi-pass algorithms. This leads to more bandwidth requirements and bigger overhead because intial and final stage must be executed in every kernel. That is because program state must be saved and restored between each two kernels. Data are saved to video memory (VRAM) which utilizes bus between GPU and VRAM.

Algorithms are limited in general functionality

Because of SIMD architecture some operations common on CPUs are not available on GPUs. For example some basic algorithms like sorting are not possible efectivelly because of random memory access restrictions. These algoritms are possible but with cost of massive multi-pass solution which is unusable due to performance overhead reasons. There is a need to find SIMD replacements of these basic algorithms and apply them effective in GPU.

Advanced accelerating structures are complex

The great afford was invested into developing sophisticated data structures for accelerating ray-scene traversal. Many advanced data structures and algorithms were presented. In today's GPUs most of them are unusable because of GPU restrictions. The challenge is to find simple enough but efective enough data structure which can be used for accelerating rays in GPUs. This will be allways trade-off between program or data structure complexity and brute-force raytracing.

GPUs are not designed for raytracing

Current rasterization specialised GPUs are not designed for complex raytracing. The task is to reformulate raytracing to adapt existing GPU programability and features. This will probably need to define some presumptions on scene types or limits to general raytracing functionality.

Dynamic scene updates are problematic

To compute rays in GPU all needed scene data must be present in fast VRAM. It is not effective to send scene data each frame to the GPU. When displaying dynamic scenes some incremental updating of data structure in GPU must be developed. This increases the complexity of algorithms.

Technical problems

Modern GPUs are evolving very fast (in terms of transistors counts and performance faster than Moore's law). There can be situation hardware supports new features early but driver or software API functionality is delayed or incorrect. This is another cave-eat for GPU programmers.

[1]Ingo Wald and Philipp Slusallek
State-of-the-Art in Interactive Ray-Tracing in State of the Art Reports,
EUROGRAPHICS 2001, pp. 21-42, Manchester, United Kingdom, September 3-7, 2001.
Homepage of Inferno Project, best viewed in Internet Explorer or Mozilla browsers, min. resolution 800x600, (c) Antonin Hildebrand 2004