Getting Your Groove
Transcription
Getting Your Groove
Getting Your Groove: Step-by-Step Performance Profiling Juan Guardado GDC Europe Tutorial Day Agenda What to profile How to profile your game System setup System tweaks VTune NVPerfHUD Increasing performance Guidelines FX Studio What is your target platform? GeForce FX 5900 Ultra Ferrari F1 GeForce FX 5200 Ferrari Modena 360 End-user survey If a new game that you really wanted to play came out and it required that you upgrade all or part of your PC, how likely would you be to upgrade your: 100% 90% 80% 70% 60% 50% 40% 30% Unlikely Unlikely Likely Likely RAM (25757) GPU (25901) Unlikely Likely 20% 10% 0% CPU (25769) Identify your target platform 5900, 5600, and 5200 products Matched CPU and memory Identify settings you care about Common resolution? Smooth multisampling? Nice filtering? Fancy shadows? etc Preliminary checks Direct3D errors and performance warnings are a bad sign Must use debug runtime Slide up validation a few notches “Cavere profilare” Switch Direct3D to retail runtime Vertical sync sucks, use graph driver, PowerStrip, or D3DPRESENT_INTERVAL_IMMEDIATE Choose scene with nominal frame rate Choose a control-group application System tweaks: FSB / AGP clocks Warning: CPU clock depends on FSB clock Readjust clock multiplier Let’s you know when bus traffic is a bottleneck BasicHLSL (AGP VB’s) 183 fps @ AGP8X 133 fps @ AGP~3X (800MB/s) BasicHLSL (SysMem VB’s) 40 fps @ 166MHz memory 53 fps @ 333MHz memory System tweaks: CPU clocks Works as multiplier of FSB clock 166MHz * 11 = 1.826GHz 166MHz * 5.5 = 918MHz Freedom Fighters 114 / 96 / 53 fps @ 1.826GHz 80 / 76 / 47 fps @ 918MHz Control app: BasicHLSL (SW VP) 45 fps @ 1.826GHz 23 fps @ 918MHz System tweaks: GPU clocks Powerstrip (www.entechtaiwan.com/ps.htm) Warning: safer to underclock than overclock Freedom Fighters 114 / 96 / 53 fps @ 450MHz 70 / 56 / 30 fps @ 225MHz Control app: BasicHLSL 180 fps @ 450MHz 93 fps @ 225MHz VTune 6.0 VTune likes symbol files next to binaries Copy ../DX90SDK/Extras/Symbols map files to ../Windows/System32 Application Good times if working in parallel with GPU D3D9.DLL Bad times, spending too much time wondering what to do NV4_DISP.DLL / NV4_MINI.SYS Depends on performance characteristics VTune results (application limited) VTune results (driver limited) NVPerfHUD Overlay that shows various vital statistics as the application runs Quick shader bottleneck check Quick texture bottleneck check Especially useful to corroborate your bottleneck theory NVPerfHUD graph descriptioin Top graph shows : Number of draw calls – Draw*Primitive*() Memory allocated – AGP and video Bottom graph shows : GPU idle – Graphics HW not processing anything Driver time – Driver doing work (states and resource management, shader compilation) Driver idle – Driver waiting for GPU to finish Frame time – Milliseconds for frame time NVPerfHUD demo MultiAnimation: draw call graph Freedom Fighters: 1x1 textures Not texture bound BasicHLSL: verify GPU bound Texturing performance 1x1 toggle easily identifies overall bottleneck Manually toggle individual stages for better analysis Pair equivalently filtered texture lookups Bilinear + Bilinear Trilinear + Trilinear Aniso + Aniso FX Studio Architecture and scheduling GeForce FX architecture Core Tex Tex ALU or x4 pipelines! 8 tex/clk and 8 math/clk ALU RGB A ALU RGB A ADD, MUL, MAD, DP3, DP4, DPH, (MOV) or 12 math/clk Conclusion Don’t be fooled by debug settings Use the tools available VTune NVPerfHUD FX Studio (Graphics Performance Analyzer, D3DSpy) Don’t by shy, talk to us Questions, comments, feedback? Juan Guardado, jguardado@nvidia.com http://developer.nvidia.com