PDF Version - Doktorandi
Transcription
PDF Version - Doktorandi
Mitglied der Helmholtz-Gemeinschaft GPU Implementations of Online Track Finding Algorithms at PANDA HK 57.2, DPG-Frühjahrstagung 2014, Frankfurt 21 March 2014, Andreas Herten (Institut für Kernphysik, Forschungszentrum Jülich) for the PANDA Collaboration 1 Mitglied der Helmholtz-Gemeinschaft PANDA — The Experiment 13 m Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 2 PANDA — The Experiment Magnet STT Mitglied der Helmholtz-Gemeinschaft MVD 13 m Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 2 PANDA — Event Reconstruction • Triggerless read out – Many benchmark channels – Background & signal similar 7/s Event Rate: 2 • 10 • Raw Data Rate: 200 GB/s Mitglied der Helmholtz-Gemeinschaft Reduce by ~1/1000 (Reject background events, save interesting physics events) Disk Storage Space for Offline Analysis: 3 PB/y 3 PANDA — Event Reconstruction • Triggerless read out – Many benchmark channels – Background & signal similar 7/s Event Rate: 2 • 10 • Raw Data Rate: 200 GB/s Mitglied der Helmholtz-Gemeinschaft Reduce by ~1/1000 GPUs (Reject background events, save interesting physics events) Disk Storage Space for Offline Analysis: 3 PB/y 3 PANDA — Tracking, Online Tracking Trigger Mitglied der Helmholtz-Gemeinschaft • But computational intensive software trigger → Online Tracking Detector layers • PANDA: No hardware-based trigger Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 PANDA — Tracking, Online Tracking Trigger Mitglied der Helmholtz-Gemeinschaft • But computational intensive software trigger → Online Tracking Detector layers • PANDA: No hardware-based trigger Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 PANDA — Tracking, Online Tracking Usual HEP experiment Trigger Mitglied der Helmholtz-Gemeinschaft • But computational intensive software trigger → Online Tracking Detector layers • PANDA: No hardware-based trigger Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 PANDA — Tracking, Online Tracking Usual HEP experiment Trigger Mitglied der Helmholtz-Gemeinschaft • But computational intensive software trigger → Online Tracking Detector layers • PANDA: No hardware-based trigger Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 PANDA — Tracking, Online Tracking Usual HEP experiment Trigger Mitglied der Helmholtz-Gemeinschaft • But computational intensive software trigger → Online Tracking Detector layers • PANDA: No hardware-based trigger Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 PANDA — Tracking, Online Tracking Usual HEP experiment Trigger Mitglied der Helmholtz-Gemeinschaft • But computational intensive software trigger → Online Tracking Detector layers • PANDA: No hardware-based trigger Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 PANDA — Tracking, Online Tracking Usual HEP experiment Trigger • But computational intensive software trigger → Online Tracking Detector layers • PANDA: No hardware-based trigger Mitglied der Helmholtz-Gemeinschaft PANDA Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 PANDA — Tracking, Online Tracking Usual HEP experiment Trigger • But computational intensive software trigger → Online Tracking Detector layers • PANDA: No hardware-based trigger Mitglied der Helmholtz-Gemeinschaft PANDA Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 PANDA — Tracking, Online Tracking Usual HEP experiment Trigger • But computational intensive software trigger → Online Tracking Detector layers • PANDA: No hardware-based trigger Mitglied der Helmholtz-Gemeinschaft PANDA Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 PANDA — Tracking, Online Tracking Usual HEP experiment Trigger • But computational intensive software trigger → Online Tracking Detector layers • PANDA: No hardware-based trigger Mitglied der Helmholtz-Gemeinschaft PANDA Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 GPUs @ PANDA — Online Tracking • Port tracking algorithms to GPU – Serial → parallel – C++ → CUDA Mitglied der Helmholtz-Gemeinschaft • Investigate suitability for online performance • But also: Find & invent tracking algorithms… • Under investigation: – Hough Transformation – Riemann Track Finder – Triplet Finder Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 5 Algorithm: Hough Transform • Idea: Transform (x,y)i → (α,r)ij, find lines via (α,r) space • Solve rij line equation for – Lots of hits (x,y,ρ)i and – Many αj ∈ [0°,360°) each Hough Transform — Princip • Fill histogram • Extract track parameters y r Mitglied der Helmholtz-Gemeinschaft Mitglied der Helmholtz-Gemeinschaft y → Bin giv α x Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 x 6 Algorithm: Hough Transform • Idea: Transform (x,y)i → (α,r)ij, find lines via (α,r) space rij = cos↵j · xi + sin↵j · yi + ⇢i • Solve rij line equation for – Lots of hits (x,y,ρ)i and – Many αj ∈ [0°,360°) each i: ~100 hits/event (STT) rij: 180—000 Hough Transform Princip j: every 0.2° • Fill histogram • Extract track parameters y r Mitglied der Helmholtz-Gemeinschaft Mitglied der Helmholtz-Gemeinschaft y → Bin giv α x Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 x 6 r Hough transformed Algorithm: Hough Transform 68 (x,y)0 points 0.6 Entries 2.2356e+08 25 0.5 Mean x 90 Mean y 0.02905 0.4 RMS x 51.96 RMS y 0.1063 20 0.3 0.2 15 0.1 0 10 -0.1 Mitglied der Helmholtz-Gemeinschaft -0.2 5 -0.3 -0.4 0 20 40 60 80 100 120 140 160 180 α Angle / ° 0 PANDA STT+MVD 1800 x 1800 Grid 7 r Hough transformed Algorithm: Hough Transform 68 (x,y)0 points 0.6 Entries 2.2356e+08 25 0.5 Mean x 90 Mean y 0.02905 0.4 RMS x 51.96 RMS y 0.1063 20 0.3 0.2 15 0.1 0 10 -0.1 Mitglied der Helmholtz-Gemeinschaft -0.2 5 -0.3 -0.4 0 20 40 60 80 100 120 140 160 180 α Angle / ° 0 PANDA STT+MVD 1800 x 1800 Grid 7 Algorithm: Hough Transform Two Implementations Thrust Plain CUDA • Performance: 3 ms/event • Performance: 0.5 ms/event – Independent of α granularity – Reduced to set of standard routines – Built completely for this task • Fitting to every problem • Fast (uses Thrust‘s optimized algorithms) • Customizable • Inflexible (has it‘s limits, hard to customize) • A bit more complicated at parts – No peakfinding included Even possible? • Adds to time! • Using: Dynamic Parallelism, Shared Memory Mitglied der Helmholtz-Gemeinschaft • – Simple peakfinder implemented (threshold) Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 8 Algorithm: Riemann Track Finder • Idea: Don‘t fit lines (in 2D), fit planes (in 3D)! • Create seeds – All possible three hit combinations • Grow seeds to tracks Continuously test next hit if it fits – Use mapping to Riemann paraboloid • Summer student project (J. Timcheck) x x y x x x Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 y x x x x y x x x Mitglied der Helmholtz-Gemeinschaft z‘ 9 Algorithm: Riemann Track Finder • GPU Optimization: Unfolding loops for () {for () {for () {}}} int ijk = threadIdx.x + blockIdx.x * blockDim.x; ⌘ 1 ⇣p nLayerx = 8x + 1 1 2 p p p 3 3 243x2 1 + 27x 1 p pos(nLayerx ) = + p p p 3 3 32/3 3 3 243x2 1 + 27x 1 → 100 × faster than CPU version Mitglied der Helmholtz-Gemeinschaft • Time for one event (Tesla K20X): ~0.6 ms 10 Algorithm: Triplet Finder • Idea: Use only sub-set of detector as seed – Combine 3 hits to Triplet – Calculate circle from 3 Triplets (no fit) • Features – Tailored for PANDA – Fast & robust algorithm, no t0 Mitglied der Helmholtz-Gemeinschaft • Ported to GPU together with NVIDIA Application Lab Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 11 Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Time Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 12 Triplet Finder — Optimizations • Bunching Wrapper Mitglied der Helmholtz-Gemeinschaft – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 13 Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best Mitglied der Helmholtz-Gemeinschaft Hit Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 13 Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best Event Mitglied der Helmholtz-Gemeinschaft Hit Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 13 Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best Event Mitglied der Helmholtz-Gemeinschaft Hit Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 13 Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best Hit Event Mitglied der Helmholtz-Gemeinschaft Bunch Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 13 Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best Hit Event Bunch Mitglied der Helmholtz-Gemeinschaft 𝒪(N2) → 𝒪(N) Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 13 Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Bunching Performance Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 14 Triplet Finder — Optimizations GPU CPU • Compare kernel launch strategies Dynamic Parallelism Joined Kernel Host Streams Triplet Finder Triplet Finder Triplet Finder thread/ 1 thread bunch bunch 1 1thread//bunch Calling Calling Calling kernel kernel kernel block 1block block//bunch 1 bunch 1 /bunch Joined Joined Joined kernel kernel kernel TF Stage #1 stream/ 1 stream bunch 1 bunch 1 stream// bunch Combining Combining Calling stream stream stream TF Stage #1 Mitglied der Helmholtz-Gemeinschaft TF Stage #1 TF Stage #2 TF Stage #2 TF Stage #2 TF Stage #3 TF Stage #3 TF Stage #3 TF Stage #4 TF Stage #4 TF Stage #4 Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 15 Triplet Finder — Kernel Launches Mitglied der Helmholtz-Gemeinschaft Preliminary (in publication) Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 16 Triplet Finder — Clock Speed / Chipset Preliminary (in publication) K40 3004 MHz, 745 MHz / 875 MHz K20X 2600 MHz, 732 MHz / 784 MHz Mitglied der Helmholtz-Gemeinschaft Memory Clock Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 Core Clock GPU Boost 17 Summary • Investigated different tracking algorithms – Best performance: 20 µs/event → Online Tracking a feasible technique for PANDA • Multi GPU system needed – 𝒪(100) GPUs Mitglied der Helmholtz-Gemeinschaft • Still much optimization necessary (efficiency) • Collaboration with NVIDIA Application Lab Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 18 Summary • Investigated different tracking algorithms – Best performance: 20 µs/event → Online Tracking a feasible technique for PANDA • Multi GPU system needed – 𝒪(100) GPUs • Still much optimization necessary (efficiency) • Collaboration with NVIDIA Application Lab Mitglied der Helmholtz-Gemeinschaft ! u o y k Than rten Andreas He h.de c i l e u j z f @ n a.herte Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 18