Advances in High-Performance GPU Ray Tracing for Physics

Transcription

Advances in High-Performance GPU Ray Tracing for Physics
Advances in High-Performance GPU Ray Tracing
for Physics-Based Simulation
Christiaan Gribble & Lee A. Butler
GPU Technology Conference
21 March 2013
Introductions
Christiaan Gribble
Alexis Naveros
SURVICE Engineering
SURVICE Engineering
christiaan.gribble@survice.com
alexis.naveros@survice.com
Lee A. Butler
Mark Butkiewicz
US Army Research Laboratory
SURVICE Engineering
lee.a.butler6.civ@mail.mil
mark.butkiewicz@survice.com
SURVICE Engineering
• Support DoD community
• Focus on combat systems
– Safety
– Survivability
– Effectiveness
• 400+ employees
• 10 locations nationally
US Army Research Laboratory
• US Army RDECOM
– Corporate laboratory
– 2000 civilian employees
• Directorates
– SLAD
– Army Research Office
– Many others
• Still in the Top 500 list
Agenda
•
•
•
•
•
Application domains
Technical motivation
Rayforce GPU ray tracing engine
Cognition-Driven Simulation
Visual Simulation Laboratory
0
1
Agenda
•
•
•
•
•
Application domains
Technical motivation
Rayforce GPU ray tracing engine
Cognition-Driven Simulation
Visual Simulation Laboratory
0
1
Agenda
•
•
•
•
•
Application domains
Technical motivation
Rayforce GPU ray tracing engine
Cognition-Driven Simulation
Visual Simulation Laboratory
0
1
Agenda
•
•
•
•
•
Application domains
Technical motivation
Rayforce GPU ray tracing engine
Cognition-Driven Simulation
Visual Simulation Laboratory
0
1
Agenda
•
•
•
•
•
Application domains
Technical motivation
Rayforce GPU ray tracing engine
Cognition-Driven Simulation
Visual Simulation Laboratory
0
1
Application domains
• Ballistic penetration
• Radio frequency propagation
• Thermal radiative transport
• High-energy particle transport
Application domains
• Ballistic penetration
• Radio frequency propagation
• Thermal radiative transport
• High-energy particle transport
Application domains
• Ballistic penetration
• Radio frequency propagation
• Thermal radiative transport
• High-energy particle transport
Technical motivation
Optical rendering
Non-optical rendering
Technical motivation
Interval computation
Interval generation
• Difficult or impossible
– Negative epsilon hacks
– Missed/repeated hits
• Performance impacts
– Traversal restart
– Operational overhead
Technical motivation
Interval computation
Interval generation
• Difficult or impossible
– Negative epsilon hacks
– Missed/repeated hits
• Performance impacts
– Traversal restart
– Operational overhead
Technical motivation
Interval computation
Interval generation
• Difficult or impossible
– Negative epsilon hacks
– Missed/repeated hits
• Performance impacts
– Traversal restart
– Operational overhead
Rayforce
• Programmable ray tracing engine
• Designed for NVIDIA GPUs
• High performance
– Modern techniques
– Novel acceleration structure
– Multiple traversal algorithms
Rayforce
• Programmable ray tracing engine
• Designed for NVIDIA GPUs
• High performance
– Modern techniques
– Novel acceleration structure
– Multiple traversal algorithms
Rayforce
• Programmable ray tracing engine
• Designed for NVIDIA GPUs
• High performance
– Modern techniques
– Novel acceleration structure
– Multiple traversal algorithms
State-of-the-art ray tracing
• Leverages modern techniques
– Ray packets
– Frustum tracing
• Exploits hardware features
– SIMD processing (v2.1)
– Architecture-specific optimizations
Proven techniques bolster
high performance
State-of-the-art ray tracing
• Leverages modern techniques
– Ray packets
– Frustum tracing
• Exploits hardware features
– SIMD processing (v2.1)
– Architecture-specific optimizations
Proven techniques bolster
high performance
State-of-the-art ray tracing
• Leverages modern techniques
– Ray packets
– Frustum tracing
• Exploits hardware features
– SIMD processing (v2.1)
– Architecture-specific optimizations
Proven techniques bolster
high performance
Acceleration structure
•
•
•
•
kd-tree
Binary Space Partitioning tree
Regular grid
Bounding Volume Hierarchy
Acceleration structure
•
•
•
•
kd-tree
Binary Space Partitioning tree
Regular grid
Bounding Volume Hierarchy
Graph-based spatial indexing
Graph-based spatial indexing
• Efficient
– Uses memory very carefully
– Improves cache performance
– Reduces memory bandwidth
• Flexible
• Scalable
Graph-based spatial indexing
• Efficient
• Flexible
– Several traversal algorithms
– Minimal overhead
– User-configurable pipelines
• Scalable
Graph-based spatial indexing
• Efficient
• Flexible
• Scalable
– Handles complex scenes
– Performance depends only
on complexity along a ray
Traversal algorithms
• First-hit
– Nearest intersected primitive?
– Visibility/bounce rays
• Any-hit
• Multi-hit
Traversal algorithms
• First-hit
• Any-hit
– Is any primitive intersected?
– Shadow/ambient occlusion rays
• Multi-hit
Traversal algorithms
• First-hit
• Any-hit
• Multi-hit
– Which primitives are intersected?
– Transparency & non-optical
rendering
Performance – tests
Coherent workloads
• vis
– first-hit visibility
– N · V shading
• x-ray
– all multi-hit intersections
– alpha blending
Incoherent workloads
• ao
– first-hit visibility
– 32 AO rays/intersection
• kajiya
– first-hit visibility
– shadows + 2 diffuse bounces
Performance – tests
Coherent workloads
• vis
– first-hit visibility
– N · V shading
• x-ray
– all multi-hit intersections
– alpha blending
Incoherent workloads
• ao
– first-hit visibility
– 32 AO rays/intersection
• kajiya
– first-hit visibility
– shadows + 2 diffuse bounces
Performance – tests
Coherent workloads
• vis
– first-hit visibility
– N · V shading
• x-ray
– all multi-hit intersections
– alpha blending
Incoherent workloads
• ao
– first-hit visibility
– 32 AO rays/intersection
• kajiya
– first-hit visibility
– shadows + 2 diffuse bounces
Performance – tests
Coherent workloads
• vis
– first-hit visibility
– N · V shading
• x-ray
– all multi-hit intersections
– alpha blending
Incoherent workloads
• ao
– first-hit visibility
– 32 AO rays/intersection
• kajiya
– first-hit visibility
– shadows + 2 diffuse bounces
Performance – scenes
ktank
1M tris
conference
282K tris
san miguel
10M tris
Images rendered at 1024x768 pixels on a NVIDIA GeForce GTX 690
Performance – results
1000
Coherent workloads
800
600
400
Incoherent workloads
200
0
Mrps
vis
x-ray
ao
kajiya
Just for Fun …
1400
1200
1000
• 1920x1080 vs 1024x768
• Single hit
• No color, Lambertian only
800
600
400
200
0
Mrps
vis
Multi-hit traversal
• Which primitives are intersected?
– One or more, & possibly all
– Ordered by t-value along ray
• Core operation in Rayforce
• Critical to interval generation
• Applications
Multi-hit traversal
• Which primitives are intersected?
• Core operation in Rayforce
– Avoids negative epsilon hacks
– Alleviates traversal restart
• Critical to interval generation
• Applications
Multi-hit traversal
• Which primitives are intersected?
• Core operation in Rayforce
• Critical to interval generation
– Handles bad geometry gracefully
– Enables early exit
• Applications
Multi-hit traversal
•
•
•
•
Which primitives are intersected?
Core operation in Rayforce
Critical to interval generation
Applications
– Physically based simulation
– Order-independent transparency
– …
Naïve multi-hit
1 function TRAVERSE(root, ray)
2
INITIALIZE(hitList)
3
node  root
4
while VALID(node) do
5
if !EMPTY(node) then
6
for tri in node do
7
if INTERSECT(tri, ray) then
8
hitData  (t-value, u, v, …)
9
ADD(hitList, hitData)
10
end if
11
end for
12
end if
13
node  NEXT(node)
14 end while
...
Find all hits
15
16
17
18
19
20
21
22
...
for hitData in hitList
if !USERHIT(ray, hitData) then
goto fini
end if
end for
label fini:
USEREND(ray)
end function
Process desired hits
Simple & effective, but
potentially slow
Rayforce multi-hit
1 function TRAVERSE(root, ray)
2
node  root
3
while VALID(node) do
4
if !EMPTY(node) then
5
SET(flags, INIT)
6
while TRUE do
7
INITIALIZE(hitList)
8
for tri in node do
9
if !DONE(hitMask, tri) then
10
if INTERSECT(tri, ray) then
11
hitData  (t-value, u, v, …)
12
if ADD(hitList, hitData) then
13
SET(flags, REPEAT)
14
end if
15
end if
16
end if
17
end for
...
Find some hits
...
18
19
20
21
22
23
24
25
26
27
28
29
if GET(flags) == (INIT & REPEAT) then
INITIALIZE(hitMask)
UNSET(flags, INIT)
end if
for hitData in hitList do
if !USERHIT(ray, hitData) then
goto fini
end if
if GET(flags) == REPEAT then
DONE(hitMask, hitData, TRUE)
end if
end for
Early exit
...
Rayforce multi-hit
...
30
if GET(flags) != REPEAT then
31
break
32
end if
33
UNSET(flags, REPEAT)
34
end while
35
end if
36
node  NEXT(node)
37
end while
38
label fini:
39
USEREND(ray)
40 end function
Per-ray cleanup
Gains efficiency with
early exit
Early Exit Buys Performance
250
+39.05%
200
150
+104.01%
Rayforce multi-hit
outperforms naïve
algorithm by 1.8x
100
+91.00%
50
0
ktank
conf
san miguel
Rayforce
•
•
•
•
first-hit
Battle-tested techniques
Novel acceleration structure
Demonstration
Multi-hit ray traversalQuadro 3000M
Hand-tuned240
for Fermi
CUDACUDA Cores @ 900 MHz
Demonstrated high performance
GPU ray tracing
any-hit
multi-hit
Rayforce
•
•
•
•
Modern techniques
Novel acceleration structure
Multi-hit ray traversal
Hand-tuned for CUDA
Demonstrated high performance
GPU ray tracing
first-hit
any-hit
multi-hit
Rayforce
•
•
•
•
first-hit
Battle-tested techniques
Novel acceleration structure
Public LGPL v2.0
Multi-hit ray traversal
ofCUDA
Rayforce now
Hand-tuned for
Demonstrated high performance
GPU ray tracing
any-hit
multi-hit
release
available!
Cognition-Driven Simulation
• Perform visualization during simulation
– As a by-product of computation
– As computation progress
• Key advantages
• Managed computation
Cognition-Driven Simulation
• Perform visualization during simulation
• Key advantages
– Enables exploration & steering
– Drives understanding & confidence
– User Cognition must be managed:
• Too fast  details missed
• Too slow  disengage
• Managed computation
Cognition-Driven Simulation
Cognition-Driven Simulation
Cognition-Driven Simulation
• Perform visualization during simulation
• Key advantages
• Managed computation
– Focus on most interesting features
– Avoid uninteresting parts of parameter
space
Visual Simulation Laboratory
• A cross-platform, open-source
application framework
– Qt, OpenSceneGraph, & other
technologies
• The foundation used for several
CDS simulation applications
Visual Simulation Laboratory
• A cross-platform, open-source
application framework
Public LGPL v2.0 release
of
VSL
now
available!
The foundation used for several
– Qt, OpenSceneGraph, & other
technologies
•
CDS simulation applications
Get the software
Rayforce
Rayforce Website:
http://rayforce.net
Source code:
http://sourceforge.net/projects/rayforce
VSL
VSL Website:
http://vissimlab.org
Source code:
http://sourceforge.net/projects/vissimlab

Similar documents