Advances in High-Performance GPU Ray Tracing for Physics
Transcription
Advances in High-Performance GPU Ray Tracing for Physics
Advances in High-Performance GPU Ray Tracing for Physics-Based Simulation Christiaan Gribble & Lee A. Butler GPU Technology Conference 21 March 2013 Introductions Christiaan Gribble Alexis Naveros SURVICE Engineering SURVICE Engineering christiaan.gribble@survice.com alexis.naveros@survice.com Lee A. Butler Mark Butkiewicz US Army Research Laboratory SURVICE Engineering lee.a.butler6.civ@mail.mil mark.butkiewicz@survice.com SURVICE Engineering • Support DoD community • Focus on combat systems – Safety – Survivability – Effectiveness • 400+ employees • 10 locations nationally US Army Research Laboratory • US Army RDECOM – Corporate laboratory – 2000 civilian employees • Directorates – SLAD – Army Research Office – Many others • Still in the Top 500 list Agenda • • • • • Application domains Technical motivation Rayforce GPU ray tracing engine Cognition-Driven Simulation Visual Simulation Laboratory 0 1 Agenda • • • • • Application domains Technical motivation Rayforce GPU ray tracing engine Cognition-Driven Simulation Visual Simulation Laboratory 0 1 Agenda • • • • • Application domains Technical motivation Rayforce GPU ray tracing engine Cognition-Driven Simulation Visual Simulation Laboratory 0 1 Agenda • • • • • Application domains Technical motivation Rayforce GPU ray tracing engine Cognition-Driven Simulation Visual Simulation Laboratory 0 1 Agenda • • • • • Application domains Technical motivation Rayforce GPU ray tracing engine Cognition-Driven Simulation Visual Simulation Laboratory 0 1 Application domains • Ballistic penetration • Radio frequency propagation • Thermal radiative transport • High-energy particle transport Application domains • Ballistic penetration • Radio frequency propagation • Thermal radiative transport • High-energy particle transport Application domains • Ballistic penetration • Radio frequency propagation • Thermal radiative transport • High-energy particle transport Technical motivation Optical rendering Non-optical rendering Technical motivation Interval computation Interval generation • Difficult or impossible – Negative epsilon hacks – Missed/repeated hits • Performance impacts – Traversal restart – Operational overhead Technical motivation Interval computation Interval generation • Difficult or impossible – Negative epsilon hacks – Missed/repeated hits • Performance impacts – Traversal restart – Operational overhead Technical motivation Interval computation Interval generation • Difficult or impossible – Negative epsilon hacks – Missed/repeated hits • Performance impacts – Traversal restart – Operational overhead Rayforce • Programmable ray tracing engine • Designed for NVIDIA GPUs • High performance – Modern techniques – Novel acceleration structure – Multiple traversal algorithms Rayforce • Programmable ray tracing engine • Designed for NVIDIA GPUs • High performance – Modern techniques – Novel acceleration structure – Multiple traversal algorithms Rayforce • Programmable ray tracing engine • Designed for NVIDIA GPUs • High performance – Modern techniques – Novel acceleration structure – Multiple traversal algorithms State-of-the-art ray tracing • Leverages modern techniques – Ray packets – Frustum tracing • Exploits hardware features – SIMD processing (v2.1) – Architecture-specific optimizations Proven techniques bolster high performance State-of-the-art ray tracing • Leverages modern techniques – Ray packets – Frustum tracing • Exploits hardware features – SIMD processing (v2.1) – Architecture-specific optimizations Proven techniques bolster high performance State-of-the-art ray tracing • Leverages modern techniques – Ray packets – Frustum tracing • Exploits hardware features – SIMD processing (v2.1) – Architecture-specific optimizations Proven techniques bolster high performance Acceleration structure • • • • kd-tree Binary Space Partitioning tree Regular grid Bounding Volume Hierarchy Acceleration structure • • • • kd-tree Binary Space Partitioning tree Regular grid Bounding Volume Hierarchy Graph-based spatial indexing Graph-based spatial indexing • Efficient – Uses memory very carefully – Improves cache performance – Reduces memory bandwidth • Flexible • Scalable Graph-based spatial indexing • Efficient • Flexible – Several traversal algorithms – Minimal overhead – User-configurable pipelines • Scalable Graph-based spatial indexing • Efficient • Flexible • Scalable – Handles complex scenes – Performance depends only on complexity along a ray Traversal algorithms • First-hit – Nearest intersected primitive? – Visibility/bounce rays • Any-hit • Multi-hit Traversal algorithms • First-hit • Any-hit – Is any primitive intersected? – Shadow/ambient occlusion rays • Multi-hit Traversal algorithms • First-hit • Any-hit • Multi-hit – Which primitives are intersected? – Transparency & non-optical rendering Performance – tests Coherent workloads • vis – first-hit visibility – N · V shading • x-ray – all multi-hit intersections – alpha blending Incoherent workloads • ao – first-hit visibility – 32 AO rays/intersection • kajiya – first-hit visibility – shadows + 2 diffuse bounces Performance – tests Coherent workloads • vis – first-hit visibility – N · V shading • x-ray – all multi-hit intersections – alpha blending Incoherent workloads • ao – first-hit visibility – 32 AO rays/intersection • kajiya – first-hit visibility – shadows + 2 diffuse bounces Performance – tests Coherent workloads • vis – first-hit visibility – N · V shading • x-ray – all multi-hit intersections – alpha blending Incoherent workloads • ao – first-hit visibility – 32 AO rays/intersection • kajiya – first-hit visibility – shadows + 2 diffuse bounces Performance – tests Coherent workloads • vis – first-hit visibility – N · V shading • x-ray – all multi-hit intersections – alpha blending Incoherent workloads • ao – first-hit visibility – 32 AO rays/intersection • kajiya – first-hit visibility – shadows + 2 diffuse bounces Performance – scenes ktank 1M tris conference 282K tris san miguel 10M tris Images rendered at 1024x768 pixels on a NVIDIA GeForce GTX 690 Performance – results 1000 Coherent workloads 800 600 400 Incoherent workloads 200 0 Mrps vis x-ray ao kajiya Just for Fun … 1400 1200 1000 • 1920x1080 vs 1024x768 • Single hit • No color, Lambertian only 800 600 400 200 0 Mrps vis Multi-hit traversal • Which primitives are intersected? – One or more, & possibly all – Ordered by t-value along ray • Core operation in Rayforce • Critical to interval generation • Applications Multi-hit traversal • Which primitives are intersected? • Core operation in Rayforce – Avoids negative epsilon hacks – Alleviates traversal restart • Critical to interval generation • Applications Multi-hit traversal • Which primitives are intersected? • Core operation in Rayforce • Critical to interval generation – Handles bad geometry gracefully – Enables early exit • Applications Multi-hit traversal • • • • Which primitives are intersected? Core operation in Rayforce Critical to interval generation Applications – Physically based simulation – Order-independent transparency – … Naïve multi-hit 1 function TRAVERSE(root, ray) 2 INITIALIZE(hitList) 3 node root 4 while VALID(node) do 5 if !EMPTY(node) then 6 for tri in node do 7 if INTERSECT(tri, ray) then 8 hitData (t-value, u, v, …) 9 ADD(hitList, hitData) 10 end if 11 end for 12 end if 13 node NEXT(node) 14 end while ... Find all hits 15 16 17 18 19 20 21 22 ... for hitData in hitList if !USERHIT(ray, hitData) then goto fini end if end for label fini: USEREND(ray) end function Process desired hits Simple & effective, but potentially slow Rayforce multi-hit 1 function TRAVERSE(root, ray) 2 node root 3 while VALID(node) do 4 if !EMPTY(node) then 5 SET(flags, INIT) 6 while TRUE do 7 INITIALIZE(hitList) 8 for tri in node do 9 if !DONE(hitMask, tri) then 10 if INTERSECT(tri, ray) then 11 hitData (t-value, u, v, …) 12 if ADD(hitList, hitData) then 13 SET(flags, REPEAT) 14 end if 15 end if 16 end if 17 end for ... Find some hits ... 18 19 20 21 22 23 24 25 26 27 28 29 if GET(flags) == (INIT & REPEAT) then INITIALIZE(hitMask) UNSET(flags, INIT) end if for hitData in hitList do if !USERHIT(ray, hitData) then goto fini end if if GET(flags) == REPEAT then DONE(hitMask, hitData, TRUE) end if end for Early exit ... Rayforce multi-hit ... 30 if GET(flags) != REPEAT then 31 break 32 end if 33 UNSET(flags, REPEAT) 34 end while 35 end if 36 node NEXT(node) 37 end while 38 label fini: 39 USEREND(ray) 40 end function Per-ray cleanup Gains efficiency with early exit Early Exit Buys Performance 250 +39.05% 200 150 +104.01% Rayforce multi-hit outperforms naïve algorithm by 1.8x 100 +91.00% 50 0 ktank conf san miguel Rayforce • • • • first-hit Battle-tested techniques Novel acceleration structure Demonstration Multi-hit ray traversalQuadro 3000M Hand-tuned240 for Fermi CUDACUDA Cores @ 900 MHz Demonstrated high performance GPU ray tracing any-hit multi-hit Rayforce • • • • Modern techniques Novel acceleration structure Multi-hit ray traversal Hand-tuned for CUDA Demonstrated high performance GPU ray tracing first-hit any-hit multi-hit Rayforce • • • • first-hit Battle-tested techniques Novel acceleration structure Public LGPL v2.0 Multi-hit ray traversal ofCUDA Rayforce now Hand-tuned for Demonstrated high performance GPU ray tracing any-hit multi-hit release available! Cognition-Driven Simulation • Perform visualization during simulation – As a by-product of computation – As computation progress • Key advantages • Managed computation Cognition-Driven Simulation • Perform visualization during simulation • Key advantages – Enables exploration & steering – Drives understanding & confidence – User Cognition must be managed: • Too fast details missed • Too slow disengage • Managed computation Cognition-Driven Simulation Cognition-Driven Simulation Cognition-Driven Simulation • Perform visualization during simulation • Key advantages • Managed computation – Focus on most interesting features – Avoid uninteresting parts of parameter space Visual Simulation Laboratory • A cross-platform, open-source application framework – Qt, OpenSceneGraph, & other technologies • The foundation used for several CDS simulation applications Visual Simulation Laboratory • A cross-platform, open-source application framework Public LGPL v2.0 release of VSL now available! The foundation used for several – Qt, OpenSceneGraph, & other technologies • CDS simulation applications Get the software Rayforce Rayforce Website: http://rayforce.net Source code: http://sourceforge.net/projects/rayforce VSL VSL Website: http://vissimlab.org Source code: http://sourceforge.net/projects/vissimlab