Sparse Fluid Simulation in DirectX

Transcription

Sparse Fluid Simulation in DirectX
Sparse Fluid Simulation in DirectX
Alex Dunn
Dev. Tech. – NVIDIA
adunn@nvidia.com
Eulerian Simulation
●
Grid based.
●
Great for simulating gaseous fluid; smoke,
flame, clouds.
●
It just works->
Basic Algorithm
Inject
2x Velocity
Advect
Pressure
2x Pressure
Vorticity
Evolve
1x Vorticity
Why Are Small Volumes Bad?
●
Fluid isn’t box shaped.
●
●
Simulated separately.
●
●
●
Often fluid will clip bounds.
Lots of state changes.
No interaction between volumes.
Tricky to render.
●
●
No sorting between volumes.
Each volume rendered separately.
Problem!
N-order problem
●
●
●
●
●
64^3 = ~0.25m cells
128^3 = ~2m cells
256^3 = ~16m cells
…
Applies to:
●
●
Computational complexity
Memory requirements
8192
7168
Memory (Mb)
●
Texture3D - 4x16F
6144
5120
4096
3072
2048
1024
0
0
256
512
768
1024
Dimensions (X = Y = Z)
And that’s just 1 texture…
What Do We Do?
●
Typically, fluid doesn’t occupy every cell!
●
Why waste cycles/memory on empty cells?
●
Options?
●
●
Trees – Tight fitting but poor lookup speed 
Bricks – Good fit and fast lookup 
Bricks
●
Split simulation space into groups of cells
(each known as a brick).
●
Simulate each brick independently.
Storage
●
Storage of bricks can take one of two
forms:
●
Compressed; allocate bricks as needed.
●
Uncompressed; allocate all bricks.
Compressed Storage
Kind of like, vector<Brick>.
Pros:
• Good memory consumption. Only
store bricks we care about.
Cons:
•
•
Allocation strategies.
Expensive neighbourhood lookup.
• “Software translation”
1 Brick = 43 = 64
1 Brick = (1+4+1)3 = 216
• New problem;
• “6n2 +12n + 8” problem.
Uncompressed Storage
Allocate everything; forget
about unoccupied cells 
Pros:
• Simulation is coherent in memory.
Cons:
•
No reduction in memory used.
Brick Map
●
We need to track which bricks are occupied!
●
New Texture3D<uint>
1 voxel per brick
●
●
●
●
0  Unoccupied
1  Occupied
Could also use packed binary grids [Holger15], but this
requires atomics 
Tracking Bricks
●
Initialise with emitter
●
●
Perform on the CPU
Expansion (unoccupied -> occupied)
●
●
Read velocity
If axial velocity enough to traverse boundary
●Expand
●
in that axis
Reduction (occupied -> unoccupied)
●
●
Handled automatically
Calculate occupied bricks every frame
Sparse Algorithm
Clear Tiles
Inject
Reset all tiles to 0
(unmapped) in tile
map.
Advect
Pressure
Vorticity
Evolve*
Fill List
Read value from
tile map.
Texture3D<uint> g_OccupiedMapRO;
AppendBuffer<uint3> g_ListRW;
Append tile
coordinate to list
if occupied.
if(g_OccupiedMapRO[idx] != 0)
{
g_ListRW.Append(idx);
}
*Also handles expansion
Numbers
●
Show performance of uncompressed grid
●
Explain how memory consumption stays the
same when using uncompressed storage.
●
Can we do better?
Enter; DirectX 11.3
●
●
●
Volume Tiled Resources (VTR)! 
Extends 2D functionality in DX11.2.
1 Tile = 64KB
Bits/Pixel
8
16
32
64
128
Tile Dimensions
64x32x32
32x32x32
32x32x16
32x16x16
16x16x16
Tiled Resources
•
Only mapped memory is
allocated in VRAM
•
Fast HW translation
(same as regular paged
memory)
•
All samplers supported
Gotcha: Tile mappings must be updated from CPU
Idea
1.
Tile == Brick
2.
Create fluid textures as tiled resources.
3.
Create tile pool per texture.
4.
Map tiles to memory using list of occupied tiles.
5.
Handle expansion/reduction of tiles in simulation.
Drawbacks
●
Tile mappings updated by
CPU.
●
●
Possible using CPU read backs.
Must use triple buffer for
speedy Map/Unmap.
●
Means 3 frame latency.
●
Predict tile mappings ahead
of time.
Frame
GPU/CPU Status
N
•CPU issues render calls for current frame.
N+1
•GPU executing calls sent from CPU during
frame N.
•CPU issues render calls for current frame.
N+2
•GPU finished executing calls sent from CPU
during frame N. Results ready.
•GPU executing calls sent from CPU during
frame N+1.
•CPU issues render calls for current frame.
N+3
•GPU finished executing calls sent from CPU
during frame N+1. Results ready.
•GPU executing calls sent from CPU during
frame N+2.
•CPU issues render calls for current frame.
N+4
...
GPU Simulation; CPU Prediction
●
We always know the
maximum velocity of the
fluid.
●
Add a skirt of “probable”
tiles around “definite”
tiles.
New Algorithm
Yes
Is Data
Available
?
Readback
Tiles
Fill List
No
Emitter
Tiles
Predict
Tiles
Build List
Map Tiles
Evolve
Vorticity
Pressure
Advect
Inject
Clear Tiles
Demo
Numbers
Questions?
Alex Dunn - adunn@nvidia.com
Twitter: @AlexWDunn
Thanks for attending.