Dynamic Resolution Rendering
Transcription
Dynamic Resolution Rendering
Dynamic Resolution Rendering Doug Binks, Intel doug.binks@intel.com Leigh Davies, Josh Doss, Matt Fife, Philipp Gerasimov, Axel Mamode, Steve Mccalla, Phil Taylor, Jeff Williams, and many more in VCSE www.intel.com/software/graphics 1 Resolution selection is one of the defining aspects of PC Gaming. VIDEO SETTINGS Select Resolution: 3 <1280x720> Resolution selection is one of the defining aspects of PC Gaming. VIDEO SETTINGS Select Resolution: 4 <1280x720> Introduction • Dynamically adjust the resolution of 3D scene to meet performance and quality goals • Render Graphical User Interface at screen resolution • 5 Performance results and demos will be shown for the 2nd generation Intel® Core™ processor family (codenamed Sandybridge) with Intel® GMA HD3000 graphics. Motivation I: The User Interface is important Trion Worlds RIFT: Planes of Telara MMO with Scaleform GFx UI 6 • In game GUIs very important to MMOs, RPGs, RTSs. • Menus can get quite populated (e.g. multiplayer game browsers) • Good to have a high resolution to include lots of content • Natural resolution of screen preferred for clear text Many thanks to Landyn Pethrus and <Wicked Mojo> of Black Dragonflight for this screenshot from Activision Blizzard’s World of Warcraft showing how important the GUI can be to game play. 7 Motivation II: Performance • Many games GPU bound • Titles are becoming increasingly pixel bound: – Post process effects – Complex shading – Deferred lighting • Leads to resolution dependency • Large variance in PC capabilities & monitor resolutions • Few users adjust settings 8 Motivation III: Quality • Quality includes frame rate and responsiveness • Users perception and weighting of quality metrics varies with usage: – Moving / Zooming on Google Maps / iPad prefers responsiveness over image quality whilst handling input • PC Games may have uncertain performance dependencies – Dynamic Resolution can help with runtime performance adjustments • Can use anti-aliasing techniques and super-sampling to improve quality when performance sufficient 9 Motivation IV: Power • Mobile systems becoming predominant. • User could switch from plugged in to battery, or run low on power • With Dynamic Resolution can lower resolution on the fly and limit FPS to reduce power consumption. – 0.5x resolution on demo cuts SNB package (CPU+GPU+System Agent) overall power to ~0.7x normal with vsync enabled. • Performance settings may throttle frequency, so need to lower demands to match new levels. 10 Sub-Sampling - Basic principle • • • • • 11 Render scene to smaller resolution render targets Upscale scene to back buffer Render GUI at full resolution Basic upscale adds 1 texture sample + 1 FB write per back buffer pixel, approx 1ms at 1280x720 on SNB – but cost can be amortized with other operations. Several games use this technique on consoles: Halo 3 for example. Render 3D 3D Render GUI 3D + GUI Conventional Render 3D 3D Upscale 3D Render GUI 3D + GUI Sub-sampling Use the Viewports to vary resolution Viewport • Render Target of 1920x1080 Viewport origin (0,0) size (1280,720) No need to change anything else when rendering to this – 12 Render Target For example – – • 3D Except for reads from a dynamic RT bound as an input, where we need to scale the texture coordinates Dynamic Resolution Render 3D Render 3D 3D 3D Scale 3D Render GUI 3D + GUI Sub-sampling 13 Scale • Dynamically vary resolution by rendering into a larger render target and constraining rendered area using viewport. 3D Render GUI 3D + GUI Super-sampling Sample Renderer • Simple forward renderer with motion blur post process • More driver and vertex dependant than many games – complex geometry scene with no LOD – no zoning / scene database system – no optimization of draw call order • So your performance results will probably be better! 14 Basic Upscale Filtering Comparisons • Point filtering • Bilinear filtering • Bicubic filtering – Christian Sigg, Martin Hadwiger, “Fast Third Order Filtering”, GPU Gems 2. Addison-Wesley, 2005. • Point + „film grain‟ style noise • Point + noise offset texture coordinates 15 Dynamic Rendering Off 18.0ms Dynamic Rendering On Resolution 100% of back buffer (no scaling) Point Filtering 19.2ms GUI Resolution same as before 3D Resolution 71% 12.6ms PS Clear 12.5ms Bilinear Filtering 12.5ms Bicubic only noticeably different to bilinear for large up-scaling (scale << 50%) Bicubic Filtering 15.6ms Noise Filtering 12.7ms Noise Offset Filtering 12.8ms 1:1 copy without scaling or filtering. Black region not copied (PS clear would not clear this from original RT). No Filter / Scaling (debug) 12.8ms 25 OFF 16.2ms ON 10.8ms Temporal Anti-Aliasing • Jitter (offset) every other frame by 0.5 pixels in X and Y – • Scale filter combines two frames – – • Use translation of projection matrix use offset texture coordinates for jittered frame sampling Get 2x the number of pixels Resulting pattern is often termed „Quincunx‟ Time 28 Frame 3D Render Final Image Temporal Anti-Aliasing with Dynamic Resolution • Gives increased final resolution when using smaller dynamic resolution buffer – 2x pixels per unit area, so observed resolution increased – Not just anti-aliasing, improves observed detail • Results in low cost AA when dynamic resolution equal or larger than screen resolution – Final pixel seen by player is a sum of >1.0 pixels from scene – Not just anti-aliasing, increased texture and shader detail 29 71% Resolution Point Filter 71% Resolution Temporal AA 100% Resolution Point Filter 100% Resolution Temporal AA Dynamic Rendering Off 18.0ms Use texture LOD offset of -0.5 during scene render Dynamic Rendering On with Temporal AA 13.1ms Intelligent Temporal Anti-Aliasing • Scale previous color with Velocity to eliminate motion artefacts (ghosting). S 1 1 K ( Vn Vn Vn 1 Vn 1 ) C (C n S C n 1 ) (1 S ) Where C is the final color outp ut. C n the current and C n 1 the p revious color render target. Vn is the current and Vn 1 the p revious velocity buffers. The constant K is ty p ically sized to be ~ 1/width. 32 Some ghosting in high contrast areas, can be tuned via K Improved edges More texture & shading detail Super-Sampling • Can also use super-sampling. • Some GPUs may not have sufficient memory for large Render Targets – Sandy Bridge has lots of memory as it uses system memory • Can readily adjust resolution to get best image at desired frame rate, e.g. 30FPS 35 Super Sampling disabled 21.2ms Super Sampling enabled Enabling SS has a low impact on performance when using PS clear Resolution 100% of back buffer (not increased yet) 21.7ms Even better edges 132% Resolution 30FPS target What filter to pick? • D3D11_FILTER_MIN_LINEAR_MAG_MIP_POINT – Sub-Sampling upscaled via point sampling – Super-Sampling needs linear filter to average pixels • Temporal Anti-Aliasing low cost, good quality option – Use D3D11_SAMPLER_DESC MipLODBias of -0.5f during 3D Scene Pass • Add noise and/or noise offset sampling to for subsampling if this suits art style • Could also additionally use MSAA (more difficult for deferred renderers) 39 Further filter tricks • Improved filtering: – Morphological Anti-Aliasing (MLAA) – Temporal Anti-Aliasing with distance to pixel center weighting and motion compensation • Weight filtering method on location on screen: – Better filter used where mouse / target reticule is – Better filtering on region where main character is – Use depth of field to assist filter choice 40 Complex Resolution Schemes • Can render different passes at different resolutions – Geometry pass at high resolution, lighting & post process at another (sometimes called Subpixel Reconstruction AntiAliasing) – Render particles to an even lower resolution buffer (already used in several games) • Can also use hybrid resolutions within a pass – Hyunwoo Ki, “Multi-Resolution Deferred Shading”, Game Programming Gems 8. Boston: Charles River Media. 2010. 41 Performance Results • Can achieve a wide range of performance / quality using technique. 42 Dynamic Resolution Performance @1280x720 300 250 FPS 200 Conventional @ 100% Point 150 Point + PS Clear 100 TAA + PS Clear Point + PS Clear + SS Enabled 50 TAA + PS Clear + SS Enabled 0 40 60 80 100 120 140 160 180 200 % of Resolution of Back Buffer (1280x720) Pixel area is square of this value 43 Theoretical Maximum Dynamic Resolution RT as Texture • Care needs to be taken using a dynamic RT as texture – Need to use adjust RT texture coordinates based on current viewport – Need to guard against dependant reads to area outside rendered RT • Clearing Render Targets with pixel shader is faster than full RT clear when using <~0.8x resolution scaling at 1280x720 (but keep normal Depth Stencil clear) • May not need clear in many cases 44 MB right / bottom error due to lack of clamp Clamp all dependent reads: Clear colour leak //clamp velocity to texture size float2 clampedPos = min( input.Tex0.xy - velocity, g_PSSubSampleRTCurrRatio.xy ); velocity = -( clampedPos - input.Tex0.xy ); Clear colour leak 45 Potential issues / improvements • Shadow map rendering resolution may also need to be scaled • Object LOD, culling and shading LOD can be linked to resolution, taking care to limit popping • If performance is too low, may still require resolution switch 46 Resolution Control Approaches • Frame time based – Adjust resolution to hit performance target – Use combination of overall frame time and GPU timings • Ignore overall frame time if it‟s high – could be full-screen to windowed switch etc. • Frame complexity based – Use metric for anticipated FPS • Frame time + Camera motion – 30 FPS when camera moving slowly, cut scenes etc. • Rapidly moving objects may need geometry based motion blur techniques; many games already happy with this FPS – 60 FPS during rapid camera movement 47 48 The Future / Speculation • Render some content at higher resolution within a given pass – Not easy for general 3D scene, problems: • Transparencies • Post Processing 49 Conclusion • Dynamic resolution gives you the tools to improve overall quality with minimal user intervention 50 Call to Action • • • • Come to the Intel booth and check it out! Add dynamic resolution to your game Investigate using Temporal Anti-Aliasing Get in touch, ask me question! – doug.binks@intel.com 51 Questions? www.intel.com/software/gdc Efficient scaling in a Task-Based Game Engine Programming Room 302 South Wed 11:30-11:30 “MAXIS-mize” graphics performance with Intel ® GPA 4.0! A DarkSpore™ case study. Programming Room 309 South Wed 1:30-2:30 Monetizing Games on Devices: Intel’s AppUp Business Room 302 South Wed 4:30-5:30 This is your brain on game development Business Room 132 North Thu 9:00-10:00 Adaptive Order Independent Transparency Programming Room 3020 West Thu 1:30-2:30 Dynamic Resolution Rendering Programming Room 110 North Fri 9:30-10:30 Increase Your FPS with CPU Onload Programming Room 110 North Fri 11:00-12:00 Hotspots, Flops and uOps Programming Room 123 North Fri 2:00-3:00 PC Gaming’s Global Value Propositions Business Room 3002 West Fri 2:00-2:25 Delivering Demand-Based Worlds with Intel® SSDs Programming Room 110 North Fri 3:30-4:30 52 Legal Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights. Intel may make changes to specifications, product descriptions, and plans at any time, without notice. The Intel processor and/or chipset products referenced in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. All dates provided are subject to change without notice. All dates specified are target dates, are provided for planning purposes only and are subject to change. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. * Other names and brands may be claimed as the property of others. Copyright © 2010, Intel Corporation. All rights reserved. Optimization Notice Optimization Notice Intel® compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel® and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel® Compiler User and Reference Guides” under “Compiler Options." Many library routines that are part of Intel® compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel ® compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for nonIntel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel ® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel ® and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not. Notice revision #20101101 Appendix I Visual Computing Home Page http://software.intel.com/en-us/visual-computing/ Threading Building Blocks: www.threadingbuildingblocks.org/ Graphics Performance Analyzers: www.intel.com/software/GPA/ Graphics Samples Home Page Keep up to date with samples releasing throughout the year Graphics Samples Page: http://software.intel.com/en-us/articles/code/ Sandy Bridge Samples Page: http://software.intel.com/en-us/articles/sandy-bridge/ 55 56