Dynamic Resolution Rendering

Transcription

Dynamic Resolution Rendering
Dynamic Resolution Rendering
Doug Binks, Intel
doug.binks@intel.com
Leigh Davies, Josh Doss, Matt Fife, Philipp
Gerasimov, Axel Mamode, Steve Mccalla, Phil
Taylor, Jeff Williams, and many more in VCSE
www.intel.com/software/graphics
1
Resolution selection is one of the
defining aspects of PC Gaming.
VIDEO SETTINGS
Select Resolution:
3
<1280x720>
Resolution selection is one of the
defining aspects of PC Gaming.
VIDEO SETTINGS
Select Resolution:
4
<1280x720>
Introduction
• Dynamically adjust the resolution of 3D scene
to meet performance and quality goals
• Render Graphical User Interface at screen
resolution
•
5
Performance results and demos will be shown for the 2nd generation
Intel® Core™ processor family (codenamed Sandybridge) with Intel® GMA
HD3000 graphics.
Motivation I: The User Interface is
important
Trion Worlds RIFT: Planes of Telara
MMO with Scaleform GFx UI
6
• In game GUIs very
important to MMOs, RPGs,
RTSs.
• Menus can get quite
populated (e.g. multiplayer
game browsers)
• Good to have a high
resolution to include lots of
content
• Natural resolution of screen
preferred for clear text
Many thanks to Landyn
Pethrus and <Wicked Mojo>
of Black Dragonflight for this
screenshot from Activision
Blizzard’s World of Warcraft
showing how important the
GUI can be to game play.
7
Motivation II: Performance
• Many games GPU bound
• Titles are becoming increasingly pixel bound:
– Post process effects
– Complex shading
– Deferred lighting
• Leads to resolution dependency
• Large variance in PC capabilities & monitor resolutions
• Few users adjust settings
8
Motivation III: Quality
• Quality includes frame rate and responsiveness
• Users perception and weighting of quality metrics
varies with usage:
– Moving / Zooming on Google Maps / iPad prefers
responsiveness over image quality whilst handling input
• PC Games may have uncertain performance
dependencies – Dynamic Resolution can help with
runtime performance adjustments
• Can use anti-aliasing techniques and super-sampling to
improve quality when performance sufficient
9
Motivation IV: Power
• Mobile systems becoming predominant.
• User could switch from plugged in to battery, or run low
on power
• With Dynamic Resolution can lower resolution on the fly
and limit FPS to reduce power consumption.
– 0.5x resolution on demo cuts SNB package (CPU+GPU+System
Agent) overall power to ~0.7x normal with vsync enabled.
• Performance settings may throttle frequency, so need
to lower demands to match new levels.
10
Sub-Sampling - Basic principle
•
•
•
•
•
11
Render scene to smaller resolution
render targets
Upscale scene to back buffer
Render GUI at full resolution
Basic upscale adds 1 texture
sample + 1 FB write per back
buffer pixel, approx 1ms at
1280x720 on SNB – but cost can
be amortized with other
operations.
Several games use this technique
on consoles: Halo 3 for example.
Render
3D
3D
Render
GUI
3D
+ GUI
Conventional
Render
3D
3D
Upscale
3D
Render
GUI
3D
+ GUI
Sub-sampling
Use the Viewports to vary resolution
Viewport
•
Render Target of 1920x1080
Viewport origin (0,0) size (1280,720)
No need to change anything else when rendering to this
–
12
Render Target
For example
–
–
•
3D
Except for reads from a dynamic RT bound as an input, where we need to
scale the texture coordinates
Dynamic Resolution
Render
3D
Render
3D
3D
3D
Scale
3D
Render
GUI
3D
+ GUI
Sub-sampling
13
Scale
•
Dynamically vary resolution
by rendering into a larger
render target and
constraining rendered area
using viewport.
3D
Render
GUI
3D
+ GUI
Super-sampling
Sample Renderer
• Simple forward renderer with motion blur post process
• More driver and vertex dependant than many games
– complex geometry scene with no LOD
– no zoning / scene database system
– no optimization of draw call order
• So your performance results will probably be better!
14
Basic Upscale Filtering Comparisons
• Point filtering
• Bilinear filtering
• Bicubic filtering
– Christian Sigg, Martin Hadwiger, “Fast Third Order Filtering”,
GPU Gems 2. Addison-Wesley, 2005.
• Point + „film grain‟ style noise
• Point + noise offset texture coordinates
15
Dynamic
Rendering Off
18.0ms
Dynamic
Rendering On
Resolution 100%
of back buffer (no
scaling)
Point Filtering
19.2ms
GUI Resolution
same as before
3D Resolution 71%
12.6ms
PS Clear
12.5ms
Bilinear Filtering
12.5ms
Bicubic only noticeably
different to bilinear for large
up-scaling (scale << 50%)
Bicubic Filtering
15.6ms
Noise Filtering
12.7ms
Noise Offset Filtering
12.8ms
1:1 copy without scaling or
filtering. Black region not
copied (PS clear would not clear
this from original RT).
No Filter / Scaling
(debug)
12.8ms
25
OFF
16.2ms
ON
10.8ms
Temporal Anti-Aliasing
•
Jitter (offset) every other frame by
0.5 pixels in X and Y
–
•
Scale filter combines two frames
–
–
•
Use translation of projection matrix
use offset texture coordinates for
jittered frame sampling
Get 2x the number of pixels
Resulting pattern is often termed
„Quincunx‟
Time
28
Frame
3D Render
Final Image
Temporal Anti-Aliasing with Dynamic
Resolution
•
Gives increased final resolution when using
smaller dynamic resolution buffer
– 2x pixels per unit area, so observed resolution
increased
– Not just anti-aliasing, improves observed detail
•
Results in low cost AA when dynamic
resolution equal or larger than screen
resolution
– Final pixel seen by player is a sum of >1.0
pixels from scene
– Not just anti-aliasing, increased texture and
shader detail
29
71% Resolution
Point Filter
71% Resolution
Temporal AA
100% Resolution
Point Filter
100% Resolution
Temporal AA
Dynamic
Rendering Off
18.0ms
Use texture LOD
offset of -0.5 during
scene render
Dynamic Rendering On
with Temporal AA
13.1ms
Intelligent Temporal Anti-Aliasing
• Scale previous color with Velocity to eliminate motion
artefacts (ghosting).
S 
1
1  K  ( Vn  Vn  Vn 1  Vn 1 )
C 
(C n  S  C n 1 )
(1  S )
Where C is the final color outp ut.
C n the current and C n 1 the p revious color render target.
Vn is the current and Vn 1 the p revious velocity buffers.
The constant K is ty p ically sized to be ~ 1/width.
32
Some ghosting in
high contrast
areas, can be
tuned via K
Improved edges
More texture &
shading detail
Super-Sampling
• Can also use super-sampling.
• Some GPUs may not have sufficient memory for
large Render Targets
– Sandy Bridge has lots of memory as it uses system
memory
• Can readily adjust resolution to get best image
at desired frame rate, e.g. 30FPS
35
Super Sampling
disabled
21.2ms
Super Sampling
enabled
Enabling SS has a low impact on
performance when using PS clear
Resolution 100% of
back buffer (not
increased yet)
21.7ms
Even better edges
132% Resolution
30FPS target
What filter to pick?
• D3D11_FILTER_MIN_LINEAR_MAG_MIP_POINT
– Sub-Sampling upscaled via point sampling
– Super-Sampling needs linear filter to average pixels
• Temporal Anti-Aliasing low cost, good quality option
– Use D3D11_SAMPLER_DESC MipLODBias of -0.5f during 3D Scene Pass
• Add noise and/or noise offset sampling to for subsampling if this suits art style
• Could also additionally use MSAA (more difficult for
deferred renderers)
39
Further filter tricks
• Improved filtering:
– Morphological Anti-Aliasing (MLAA)
– Temporal Anti-Aliasing with distance to pixel center weighting
and motion compensation
• Weight filtering method on location on screen:
– Better filter used where mouse / target reticule is
– Better filtering on region where main character is
– Use depth of field to assist filter choice
40
Complex Resolution Schemes
• Can render different passes at different resolutions
– Geometry pass at high resolution, lighting & post process at
another (sometimes called Subpixel Reconstruction AntiAliasing)
– Render particles to an even lower resolution buffer (already
used in several games)
• Can also use hybrid resolutions within a pass
– Hyunwoo Ki, “Multi-Resolution Deferred Shading”, Game
Programming Gems 8. Boston: Charles River Media. 2010.
41
Performance Results
• Can achieve a wide range of performance /
quality using technique.
42
Dynamic Resolution Performance
@1280x720
300
250
FPS
200
Conventional @ 100%
Point
150
Point + PS Clear
100
TAA + PS Clear
Point + PS Clear + SS Enabled
50
TAA + PS Clear + SS Enabled
0
40
60
80
100
120
140
160
180
200
% of Resolution of Back Buffer (1280x720)
Pixel area is square of this value
43
Theoretical Maximum
Dynamic Resolution RT as Texture
• Care needs to be taken using a dynamic RT as texture
– Need to use adjust RT texture coordinates based on current
viewport
– Need to guard against dependant reads to area outside
rendered RT
• Clearing Render Targets with pixel shader is faster than
full RT clear when using <~0.8x resolution scaling at
1280x720 (but keep normal Depth Stencil clear)
• May not need clear in many cases
44
MB right / bottom error due to lack of
clamp
Clamp all dependent reads:
Clear colour
leak
//clamp velocity to texture size
float2 clampedPos = min( input.Tex0.xy - velocity,
g_PSSubSampleRTCurrRatio.xy );
velocity = -( clampedPos - input.Tex0.xy );
Clear
colour leak
45
Potential issues / improvements
• Shadow map rendering resolution may also
need to be scaled
• Object LOD, culling and shading LOD can be
linked to resolution, taking care to limit popping
• If performance is too low, may still require
resolution switch
46
Resolution Control Approaches
• Frame time based
– Adjust resolution to hit performance target
– Use combination of overall frame time and GPU timings
• Ignore overall frame time if it‟s high – could be full-screen to windowed switch etc.
• Frame complexity based
– Use metric for anticipated FPS
• Frame time + Camera motion
– 30 FPS when camera moving slowly, cut scenes etc.
• Rapidly moving objects may need geometry based motion blur techniques;
many games already happy with this FPS
– 60 FPS during rapid camera movement
47
48
The Future / Speculation
• Render some content at higher resolution
within a given pass
– Not easy for general 3D scene, problems:
• Transparencies
• Post Processing
49
Conclusion
• Dynamic resolution gives you the tools to
improve overall quality with minimal user
intervention
50
Call to Action
•
•
•
•
Come to the Intel booth and check it out!
Add dynamic resolution to your game
Investigate using Temporal Anti-Aliasing
Get in touch, ask me question!
– doug.binks@intel.com
51
Questions?
www.intel.com/software/gdc
Efficient scaling in a Task-Based Game Engine
Programming
Room 302 South
Wed 11:30-11:30
“MAXIS-mize” graphics performance with Intel ® GPA 4.0! A DarkSpore™ case study.
Programming
Room 309 South
Wed 1:30-2:30
Monetizing Games on Devices: Intel’s AppUp
Business
Room 302 South
Wed 4:30-5:30
This is your brain on game development
Business
Room 132 North
Thu 9:00-10:00
Adaptive Order Independent Transparency
Programming
Room 3020 West
Thu 1:30-2:30
Dynamic Resolution Rendering
Programming
Room 110 North
Fri 9:30-10:30
Increase Your FPS with CPU Onload
Programming
Room 110 North
Fri 11:00-12:00
Hotspots, Flops and uOps
Programming
Room 123 North
Fri 2:00-3:00
PC Gaming’s Global Value Propositions
Business
Room 3002 West
Fri 2:00-2:25
Delivering Demand-Based Worlds with Intel® SSDs
Programming
Room 110 North
Fri 3:30-4:30
52
Legal Disclaimers
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH
PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL
PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,
COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT.
Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.
Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter.
The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents,
trademarks, copyrights, or other intellectual property rights.
Intel may make changes to specifications, product descriptions, and plans at any time, without notice.
The Intel processor and/or chipset products referenced in this document may contain design defects or errors known as errata which may cause the product to deviate from
published specifications. Current characterized errata are available on request.
All dates provided are subject to change without notice. All dates specified are target dates, are provided for planning purposes only and are subject to change.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
* Other names and brands may be claimed as the property of others.
Copyright © 2010, Intel Corporation. All rights reserved.
Optimization Notice
Optimization Notice
Intel® compilers, associated libraries and associated development tools may include or utilize options that optimize for
instruction sets that are available in both Intel® and non-Intel microprocessors (for example SIMD instruction sets), but do not
optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that
are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler
options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel® Compiler User
and Reference Guides” under “Compiler Options." Many library routines that are part of Intel® compiler products are more
highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel ® compiler
products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your
code and other factors, you likely will get extra performance on Intel microprocessors.
Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for nonIntel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel ®
Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming
SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability,
functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent
optimizations in this product are intended for use with Intel microprocessors.
While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel ® and
non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet
your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please
let us know if you find we do not.
Notice revision #20101101
Appendix I
Visual Computing Home Page
http://software.intel.com/en-us/visual-computing/
Threading Building Blocks:
www.threadingbuildingblocks.org/
Graphics Performance Analyzers:
www.intel.com/software/GPA/
Graphics Samples Home Page
Keep up to date with samples releasing throughout the year
Graphics Samples Page:
http://software.intel.com/en-us/articles/code/
Sandy Bridge Samples Page:
http://software.intel.com/en-us/articles/sandy-bridge/
55
56