Visualisation of Large Datasets with Houdini
Ben Simons
Data Arena Lead Developer
University of Technology, Sydney
New UTS Broadway Building
UTS Data Arena
~ April 2014
Today's Outline - Big Data
1. Some strategies used in Film Visual FX
2. Visualisation Techniques in Houdini
3. VFX Data Formats & Disk Systems
Happy Feet 2
2 Petabytes (2,000,000 GB)
3D Stereo HD images
Render: 18,000 cpu cores
Parallel access to data
HDF5 data on Bluearc & Isolon
NAS Disk Systems
Linux software: Maya, Houdini,
Naiad, Nuke, 3Delight
Entirely made at Carriageworks
in Sydney at Dr D Studios
Resident Evil 3 Extinction
The Desert Undead: 18-layer images (Rman AOV's)
Each single image frame was split into 96 tiles
Rendered on 96 machines, then each frame tile-joined
Houdini across 2 screens
Houdini Object Nodes
Houdini Procedural Network
Houdini Parameters
Houdini Chops
Channel is a column of data
Plain textfiles ok – separate
columns with tabs
Interactive Channel graph
(zoom in)
Visual programming
Filtering, Sampling, shading,
instancing, and rendering
Hands-on tomorrow will be
Chops & Vops
Spitzer Glimpse Dataset
Spitzer Space Telescope
South: ~300 files, 78 different Channels, 145K rows
gzipped .tbl data loaded into Houdini
Houdini Chops used to filter & calc 'colours'
Show difference of infra-red magnitude bands
Point colours and scales calculated by VOPs SIMD
Houdini Movie Rendered (Mantra PBR)
36M points, filtered <12M
Shading & VOP's
A shader is a mini-program which makes data
It can be better to generate data than load it.
Shaders allow additional level of management
Geom shaders on HF2 generated 1 billion snow
particles per image frame (impossible to load).
Houdini VOP's are SIMD
VOP Network
Saves Memory & I/O by re-using geometry
Copies generated at render time
Each Instance can be varied based on point
Referencing one “instance object” provides a
massive data reduction
Adaptive Meshes, LOD, Caching &
Data reduction techniques
Level of Detail (distance from camera)
Adaptive Meshes
Cache common files locally
Filter texture (images) - Mipmapping
Other tricks Baked Lighting & Shadows
Pre-calculate lighting
& shadows
“bake” new textures
& reapply onto geom
Sydney Harbour
Multi-Beam Sonar
Survey, 30cm data.
Interactive 3D Flythrough
Know ur Limits: Memory & I/O
I/O will Bottleneck - Partition the problem & then scale it up
Split job across many independent machines (eg. render)
Segment data access for each machine (eg. HDF5)
Alternate memory hardware
Vector (array) processor - SIMD
as Cray, now intel SSE/MMX and Nvidia GPU
IBM Cell Processor has Vector Processor
Content-Addressable Memory
“associative arrays” are used by Network Routers
Types of System Memory
Virtual Memory
Swapping is good, thrashing is bad
SMP Symmetric Multiprocessing: Multiple CPU's with
common/shared memory. Multi-threaded apps.
eg. Intel Xeon, Core 2 Duo are SMP.
– Cache coherency, snooping bus (on distributed SM)
MPI (Message Passing) PVM Clusters, Beowulf, etc
(Memory not shared)
Data Formats
HDF5 “Heirachical Data Format”
Browsable container of data (HDFView)
Has “groups & datasets” like “dirs & files”
Data stored in B-Trees
Can also store Binary Data
HDF5 for Python
Operate on HDF5 data via python dictionaries
& NumPy arrays -
Disk Systems
Network Attached Storage (NAS)
Bluearc (now Hitachi) implemented via FPGA
Isilon (now EMC) clustered filesystem, 100GB/s
Lustre Filesystem
Multiple SSD nodes & maintains global file coherency
Experimental Parallel distributed filesystem – can
have multiple copies of a file, one master.
Venti (Bell Labs Plan-9 & Inferno)
WORM Archive. Shares Blocks by secure SHA-1 Hash.
Data Formats 2
Open VDB
Hierachical structure for volumetric data (“clouds”)
Good for sparse volumetric time-varying data
Fast access (constant-time) to voxels
Large set of operators (Level Set tools, filters,
transforms & morphological operators)
Data Formats 3
Disney Ptex eliminates uv texture assignment
no (u,v)'s required! no seams visible
works on sub-d/poly faces
Stores face adjacency data & filters
Efficiently stores 106 mipmapped texture files
Multi-channels, compressed separately
Used in Disney's “Bolt”
“D3” Data-Driven Documents
D3 – An amazing Data visualisation web framework (javascript)
Offers Parallel Coordinates
Demo ? Nutrient Contents - An interactive visualization of
the USDA Nutrient Database.
Parallel Co-ordinates
protein, calcium, sodium, fibre, vitamin c, potassium, carbohydrate, sugar, fat, water, calories, saturated, ...