Examples - Physikalisches Institut Heidelberg
Transcription
Examples - Physikalisches Institut Heidelberg
Examples ASIC and FPGA digital designs in physics experiments © V. Angelov VHDL Vorlesung SS2009 1 Examples • • • • • Prototypes in the old times Sweet-16 – a student RISC processor TRAP chip for ALICE TRD – mixed mode ASIC Optical Readout Interface (ORI) – CPLD Detector Control System Board for ALICE (TRD+TPC) – FPGA + ARM CPU core • Power Distribution Box (PDB) – antifuse FPGA • Global Tracking Unit for ALICE TRD – large FPGA farm © V. Angelov VHDL Vorlesung SS2009 2 Prototypes yesterday... Interface board for PHA (pulse high analysis) with 74’xxx 3 x SRAM 2k x 8 Battery for the SRAM Full size ISA card for IBM/XT/AT © V. Angelov VHDL Vorlesung SS2009 3 Sweet-16 RAM WAddr=Qb WData=Qc RAddr=Qb RAM CLK Dq WE Pq WE WA Rb,Rc Status Pq ROM © V. Angelov Control ProgCount ProgCount Status C Registerfile Qc MUX Dq ALU Registerfile Qb • 16 bit RISC processor • 1 clock/instruction • easy to implement by students without experience • compact and portable to different technologies, used in FPGAs and ASICs VHDL Vorlesung SS2009 4 A Large Ion Collider Experiment • Pb-Pb Collision at 1.1 PeV/Nukleon • Creation of Quark Gluon Plasma • TRD is used as a trigger detector due to its fast readout time (2 µs): – Transversal Momentum – Electron/Pion Separation © V. Angelov Inner Tracking System (ITS) Time Projection Chamber (TPC) Transition Radiation Detector (TRD) VHDL Vorlesung SS2009 5 Transition Radiation Detector TRD - Transition Radiation Detector • used as trigger and tracking detector • > 24000 particles / interaction in acceptance of detector • up to 8000 charged particles within the TRD • trigger task is to find specific particle pairs within 6 μs. ITS - Inner Tracking System • event trigger • vertex detection TPC - Time Projection Chamber • high resolution tracking detector • but too slow for 8000 collisions / second © V. Angelov VHDL Vorlesung SS2009 6 TRD Structure stack 1.2 million channels 1.4 million ADCs peak data rate: 16 TB/s ~65000 MCMs computing time: 6 µs PASA 6 plane s • • • • • MCM TRAP 18 channels TR-detector φ r z 5 m ax .1 6 pa dr ow s O RI O RI m od ul e rin gs B =0 .4 T ve rt ex module 18 Supermodules in azimuth 8 MCM ≡ 144 channels 1080 optical links @2.5Gbps © V. Angelov MCM performs amplification, digitization (10bit 10MHz), straight line fit, readout network VHDL Vorlesung SS2009 7 Partitioning, Data Flow & Reduction MCM - Multi Chip Module TRD PASA ADC Tracklet Tracklet Preprocessor Processor TPP TP L1 trigger to CTP Network Interface NI GTU to HLT & DAQ event buffer store raw data until L1A detector 6 layers 1.2 million analog channels time: data / event: peak rate: mean rate: reduction: © V. Angelov charge sensitive preamplifier shaper 10 Bit ADC digital filter 10 MSPS preprocess data 21 channels event buffer during first 2 µs (drift time) 33 MB 16 TB/s 257 GB/s 1 fit tracklets for merge tracklets builds readout trigger into tracks for tree functionality trigger for process raw data process raw data trigger & raw data monitoring for HLT after 3.5 µs after 4.1 µs after 6 µs max. 80 KB some bytes 600 GB/s ~ 400 to trigger decision VHDL Vorlesung SS2009 8 FEE development How to design the FEE? - fast and low latency - low power - precise power control (1mW/channel → 1kW) - low cost - avoid using connectors - use some simple chip package (MCM + ball grid array) - standard components? which process? IP cores? Layout (TPC vs. TRD)? - flexible, as much as possible, as the exact processing not known - make everything configurable - use CPU core for final processing - reliable, no possibility to repair anything later - redundancy, error and failure protection - self diagnostic features © V. Angelov VHDL Vorlesung SS2009 9 FEE development(2) Chip Design flow 1. Detector simulations to understand the signals and what kind of processing we need 2. Select PASA shaping time, ADC sampling rate and resolution 3. Behavior model of the digital processing including the bitprecision in every arithmetic operation 4. Estimate the processing time and select the clock speed of the design (multiple of LHC and ADC sampling clock) 5. Code the digital design, simulate it, synthesize it, estimate the timing and area, optimize again… 6. Submit the chip, this is the point of no return! 7. Continue with the simulations, find some bugs and think about fixes 8. Prepare the test setup And so on, TRAP1, 2, TRAPADC, TRAP3, TRAP3a (final) © V. Angelov VHDL Vorlesung SS2009 10 Readout Boards Half-chamber 2.5 Gbps 850 nm MCM as Readout Network data source only data source and data merge data merge only © V. Angelov VHDL Vorlesung SS2009 11 Detector Readout Drift 2 Processing 4 Transmission time 4.6 1,200,000 channels, 1,400,000 ADCs 10 bit 10 MSPS, 20 samples/event preprocessing A + D 65,000 MCMs, 18+3 channels/MCM, 4 CPUs at 120 MHz 4,100 Readout Boards, 16+1(+1) MCMs 8 bit 120MHz DDR readout tree 1,080 Optical Readout Interface links 2 links/chamber at 2.5 Gb/s 90 Track Matching Units (GTU) 1 TMU/module, Xilinx FPGA based MCM 3+own:1 4:1 BM Merger 0 4:1 HCM ORI ... 12 optical links 5:1 TMU GTU Pretrigger SMU DDL GTU finished 6 Central Trigger Processor time [µs] © V. Angelov x18 ... TGU CTP VHDL Vorlesung SS2009 12 Multi Chip Module 4 cm PASA Internal ADCs (Kaiserslautern) Digital Frontend and Tracklet Preprocessor Master State Machine CPU cores, memories & periphery Global I/O-Bus Serial Interface slave Network Interface © V. Angelov VHDL Vorlesung SS2009 External Pretrigger Serial Interface Readout Network 13 10 bit 10 MHz, 12.5 mW, low latency TRAP block diagram 21 ADCs Nonlinearity Correction Filter Pedestal Correction Digital Filters Gain Correction 64 samples Event Buffer Tail Cancellation Crosstalk Suppression Memory: 4 x 4k for instr. 1k x 32 Quad ported for data Hamming prot. Hit Detection CPU Hit Selection CPU IMEM Fitting Fitting Fitting Fitting Unit Unit Unit Unit DMEM GRF CPU IMEM Fit Register File IMEM CFG CPU Flags IMEM NI 4 x 8 bit 120 MHz DDR inputs GSM Standby Armed Acquire Process Send FIFO FIFO 24 Mb/s serial network SCSN FIFO PC Decoder FRF PRF GRF CONST FIFO DMEM IMEM TRAP 0.18µm UMC mixed-mode process © V. Angelov ALU Pipe 1 8 bit 120 MHz DDR Pipe 2 Bus (NI) 4x RISC CPU @ 120 MHz VHDL Vorlesung SS2009 14 Filter and Tracklet Preprocessor ADC DFIL ADC DFIL Event Buffer Condition Check Digital FILter 64 timebins deep Non- Offs Gain Tail- CrossLin canc talk Q hit Condition Check Event Buffer 18+1 channels ADC DFIL Condition Check Q hit COG Position Calc LUT Parameter Calc COG Position Calc LUT Parameter Calc COG Position Calc LUT Parameter Calc COG Position Calc LUT Parameter Calc Event Buffer ADC DFIL Condition Check Event Buffer ADC DFIL © V. Angelov Event Buffer Q hit COG = AL − AR AC FIT Register File and tracklet selection ADC DFIL Q hit Hit Select Unit (max. 4 hits) Event Buffer CPU0 CPU1 CPU2 CPU3 FIT register file is for the CPUs a readonly register file VHDL Vorlesung SS2009 15 10 © V. Angelov 12 12 8 8 amp R y-1 y y+1 COG pos C VHDL Vorlesung SS2009 to processor max. 4 cand. calculate sums for regression select candidates max. 4 position channel selection position calculation position correction 21x event buffer filter & event buffer time bins 10 history fifo nonlinearity corr. pedestal corr. gain adjustment tail cancellation crosstalk cancell. 21 digital channels from ADCs (10 MHz) Filter & Preprocessor preprocessor 232 232 deflection L origin 16 Tracking Arithmetics Hit Selection Timer Pedestal Substraction Qi +1 (t ) − Qi −1 (t ) Qi (t ) Look-Up Table + + Q (t) + Y(t) + Y2 (t) + X(t)·Y(t) + X2(t) + X(t) +1 Fit Register File © V. Angelov VHDL Vorlesung SS2009 17 The MIMD Architecture Local Bus CPU 0 • • Local Bus IMEM Const. • CPU 1 Interrupt IMEM • • Evt. Buffer FIT Bus Cnt/Timer GRF 4x10 Network Interface D-MEM 10 • • IMEM CPU 2 IMEM • • CPU 3 4 Local Bus © V. Angelov Local Bus Config. 4 • Four RISC CPU's Coupled by Registers (GRF) and Quad ported data Memory Register coupling to the Preprocessor Global bus for Periphery Local busses for Communication, Event Buffer read and direct ADC read I-MEM: 4 single ported SRAMs Serial Interface for Configuration IRQ Controller for each CPU Counter/Timer/PsRG for each CPU and one on the global bus Low power design, CPU clocks gated individually VHDL Vorlesung SS2009 18 MIMD Processor Preprocessor, 4 sets of fit data MIMD processor 4 CPUs shared memory / register file global I/O bus arbiter separate instruction memory coupled data & control paths IMEM CPU • • • • • Harvard style architecture two stage pipeline 32 Bit data path register to register operations fast ALU • 32x32 multiplication • 64/32 radix-4 divider • • maskable interrupts synchronization mechanisms © V. Angelov DMEM decoder PC interrupt pipeline register • • • • • CPU0 CON GRF FIT PRF select operands write back ALU clks local I/O busses rst power control external interrupts I/O bus arbiter VHDL Vorlesung SS2009 global I/O bus 19 Local and Global IO • req we busf CPU 0, 1, 2, 3 r/w addr w data r data r/w addr w data r data • req Arbiter • Configuration unit 16 32 32 Global Bus Devices • • Local bus uses the same r/w address and w_data signals. The read data register on the global bus is a read only device in the local bus. © V. Angelov Load/Store Instructions No tri-state, the output data are ORed, the non-selected devices respond with 0 Synchronously read/write on the Global Bus (Arbiter), the access time can be programmed. Read has priority over write, the configuration unit has priority over CPU 0, 1, 2, 3 VHDL Vorlesung SS2009 20 SCSN - Slow Control Serial Network serial Slave ring 1 (DCS) bridged Slave © V. Angelov (DCS) SCSN • Slave Master Slave Slave • • • Slave ring 1 Master Slave Slave ring 0 ring 0 Slave Up to 126 slaves per ring CRC protected 24 MBit/s transfer rate 16 addr., 32 databits/frame VHDL Vorlesung SS2009 Slave Slave Slave 21 NI Datapath 10 CPU 3 CPU 4 port2 port1 port0 16 16 16 16 I/O 0 16 local I/O 1 I/O 1 16 local I/O 2 I/O 2 16 local I/O 3 I/O 3 16 global I/O I/O G 16 16 16 16 clk config port3 16 local I/O 0 Network Interface 10 FiFo 64x16 16 16 FiFo 64x16 16 clk GRF CPU 2 10 clk IMEM CPU 1 global bus arbiter DMEM Network Interface clk Processor 10 FiFo 64x16 FiFo 64x16 • local & global I/O interfaces • input port with data resync. and DDR decoding • input FIFOs (zero latency) • port mux to define readout order • output port with DDR encoding and programmable delay unit 16 port4 Delay units 10 © V. Angelov VHDL Vorlesung SS2009 22 TRAP development – a long long way MCM for 8 channels with the first prototypes of the Digital chip (FaRo-1) and Preamplifier, commercial ADCs Beg 2001 © V. Angelov First tested TRAP chip, in „spider“ mode Summer of 2002 In total ≈60,000 lines (synthesis) and 18,000 lines (simulation) of VHDL code VHDL Vorlesung SS2009 23 TRAP1 bonded on a MCM 1 2 6 3 On-chip FuseID 4 5 1. ADC 2. Filter/Preprocessor 3. DMEM 4. CPUs 5. Network Interface 6. IMEM © V. Angelov VHDL Vorlesung SS2009 24 13 ch.evt.buf 21 ADC Channels TRAP Layout 8 ch.evt.buf DataBuffer 21 independent Digital filter channels CPU 3 CPU 2 Quad Port Memory GRF 5x7 mm CPU 0 CPU 1 IMEM 0 © V. Angelov IMEM 2 IMEM 3 IMEM 1 network IF FiFo VHDL Vorlesung SS2009 25 TRAP internal tests 4 x 4k x 24 IM 0 1k x 32 DM ADC 256 x 32 DB N P G T C © V. Angelov r0 CPU0 21x Event buffer Port 3 Global bus In total 434 configuration registers r1 StateM (~25) NI (~12) IRQ(64) Counters(8) ~130 Const(20) Arbiter/ SCSN slv LUT-nonl(64) Gain corr(42) LUT-pos(128) ~280 Fil/Pre(~44) r1 VHDL Vorlesung SS2009 r0 26 Test flow of the MCM testing • Apply voltages, control the Programmable currents supply voltages, • JTAG connectivity test current control LVDS • Basic test using SCSN (serial 8 bit configuration bus) MCM0 MCM1 120MHz • Test of all internal components Progr. DDR using the CPUs Step • Test of the fast readout Gener. MCM • Test of the ADCs by applying FPGA1 FPGA2 (DUT) 200 kHz Sin-wave Progr. • Test of the PASA by applying Sin-wave voltage steps through serial Gener. MCM3 MCM2 capacitors • Store all data for each MCM in separate directory, store in XML SCSN file the essential results PCI(PC) • Export the result for MCM marking and sorting +1.8Vd +3.3Vd +1.8Va +3.3Va © V. Angelov VHDL Vorlesung SS2009 27 MCM Tester and results • test of 3x3 or 4x4 MCMs • digital camera with pattern recognition software for precise positioning using an X-Y table • vertical lift for contacting • about 1 min/MCM for positioning and test NM 3% BM 6% BAD 4% CM 1% T.Blank, FZK IPE (Karlsruhe) • store the result into a DB • mark later the tested MCMs with serial Nr. and test result code © V. Angelov GOOD 86% VHDL Vorlesung SS2009 28 TRAP wafer test and results Programmable power supply A Programmable sin-wave generator Parallel readout TRAP Serial configuration interface 576 TRAP chips/wafer Fully automatic partial test of the TRAP 100% 06/06 Up to now produced and tested 201 wafers with ~129,000 TRAPs, of them ~98,000 usable 07/01 07/03-2 08/09 25 25 07/03-1 76% 06/02 Prep 25 © V. Angelov 06/09 25 49 49 3 25 VHDL Vorlesung SS2009 29 Optical Readout Interface (ORI) 120MHz 8 bit DDR Magnetic field & radiation tolerant! © V. Angelov +24 ns +24 ns +300 ns Conf. Mem. 125MHz I2C LVDS-TTL HCM (TRAP) latency SERDES 2.5GBits/s CPLD Laser Driver 16 DDR SDR Resynchronization, status, counters VCSEL Laser Diode 850 nm All 1200 produced and tested, 1199 of them fully functional VHDL Vorlesung SS2009 30 DCS Board • ARM based technology • 100k FPGA flexibility • 32MB SDRAM • LINUX system with EasyNet © V. Angelov VHDL Vorlesung SS2009 31 Power Distribution Box Actel antifuse FPGA Switch on/off the power supply to 30 DCS boards Control of 9 PDB/2 Altera ARM+FPGA 540 DCS boards in ALICE TRD © V. Angelov VHDL Vorlesung SS2009 32 Design of VLSI Circuits using VHDL The ALICE TRD Global Tracking Unit An Example of a large FPGA-based System in High-energy Phyics © V. Angelov, F. Rettig VHDL Vorlesung SS2009 A Typical Example • High-energy & heavy ion experiments: – Huge amounts of data, selection of interesting events → triggers – Performance limited by data processing power of front-end electronics • Requirements for electronics: – Complex trigger algorithms, very short decision times → high-performance & low latency processing – Advanced trigger interlacing strategies to minimize detector dead times (multi-event buffering) → high bandwidth data paths – Demands change quickly as research advances → flexibility © V. Angelov, F. Rettig VHDL Vorlesung SS2009 2 /48 The Large Hadron Collider LHC • p-p @ 14 TeV • Pb-Pb @ 1150 TeV CMS LHCb ATLAS © V. Angelov, F. Rettig VHDL Vorlesung SS2009 3 /48 The Experiment ALICE ALICE • Research on Quark-GluonPlasma • Many detectors covering a wide momentum range & PID • Designed for high multiplicity events in Pb-Pb collisions CMS LHCb ALICE ATLAS © V. Angelov, F. Rettig VHDL Vorlesung SS2009 4 /48 ALICE & TRD Transition Radiation Detector (TRD) lead nucleus ~ speed of light © V. Angelov, F. Rettig lead nucleus ~ speed of light 540 drift chambers in 6 layers, 90 stacks, 18 super-modules, |η| ≤ 0.9 VHDL Vorlesung SS2009 5 /48 Task of the TRD ALICE Event Display TRD • High multiplicities: up to 8,000 charged tracks in acceptance • Fast trigger detector: L1 trigger after 6.2µs • Barrel tracking detector: raw data High-pt track © V. Angelov, F. Rettig VHDL Vorlesung SS2009 6 /48 The TRD Data Chain ALICE Global Tracking Unit ALICE DAQ Raw Data Tracklets Online Track Reconstruction Triggers Event Buffering Raw Data 1080 optical fibres 2.7 TBit/s ALICE CTP L1 Triggers L2 Triggers Inside Magnet Fast Detectors T0 TOF V0 ACC PreTrigger Pre-Trigger L0/L1 Trigger On-Detector Front-End Electronics • On-Detector Front-End Electronics: 65,564 ASICs • Global Tracking Unit: 109 FPGAs © V. Angelov, F. Rettig VHDL Vorlesung SS2009 7 /48 On-Detector Data Processing Particle track deflection 3cm 32-Bit Tracklet Word Deflection Drift Volume Position y, z PID Parameterization of stiff track segment Hits Radiator Material y position • 540 drift chambers, 6 stacked radially, 18 sectors in azimuth • 1.4 million analog channels • 10 MHz sampling rate © V. Angelov, F. Rettig • 65,564 Multi-Chip modules, 262,256 custom CPUs • Up to 20,000 tracklet words, 32-Bit wide • Massively parallel calculations: hit detection, straight line fit, PID information • Transmission out of magnet via 1080 optical fibres operating at 2.5 GBit/s • tracklets available 4.5µs after collision • 2.1 TBit/s total bandwidth VHDL Vorlesung SS2009 8 /48 Tight Timing Requirements! < 6.2 µs Level-0 Trigger Collision Level-1 Trigger < 1.8 µs Drift Time Fit Calculation Tracklet Building Raw Data Shipping Tracklet Shipping Tracking & Trigger TRD Level-1 Trigger Contribution Shipping 0 1 2 3 4 5 6 7 8 Time after Collision [µs] Collision Level-0 Trigger 1.2 Level-1 Trigger Level-2 Trigger Window 5.0 µs 73.8 µs 493.8 µs Tracklets Raw Data Accept/Reject Data Forward to DAQ Time after Collision © V. Angelov, F. Rettig VHDL Vorlesung SS2009 9 /48 Global Tracking Unit • Fast L1 trigger after 6.2µs • Detection & reconstruction high-momentum tracks • Calculation of momenta • Various trigger schemes: di-lepton decays (J/ψ, ϒ), jets, ultra-peripheral collisions, cosmics • Raw data buffering • Multi-Event buffering & forwarding to data acquisition system • Support interlaced triggers, multievent buffering, dynamic sizes • 109 boards with large FPGAs in three 19" racks outside of magnet © V. Angelov, F. Rettig VHDL Vorlesung SS2009 10 /48 Global Tracking Unit GTU segment for one TRD supermodule © V. Angelov, F. Rettig Patch panel with 60 fibres for one TRD supermodule VHDL Vorlesung SS2009 11 /48 3-Tier Architecture Tracklets from Front-End (1,080 optical fibres, 2.1 TBit/s) Processing Node Processing Node ···18x··· Stack High-pt Tracks Supermodule TRD Supermodule Raw Data Concentrator Node High-pt Tracks + Trigger Information ···18x··· all track segments of one detector stack Online Track Reconstruction 90 Boards 2.9 GByte/s into each Node Concentrator Node Trigger Stage 1 Raw Data → DAQ <3.5 GByte/s to DAQ ···18x··· Top Trigger Node Trigger Stage 2 few bit, ~200 Hz TRD L1/L2 Trigger Contribution © V. Angelov, F. Rettig VHDL Vorlesung SS2009 12 /48 Processing Node • Inputs: tracklets & raw data via 12 optical data streams at 2.5 GBit/s each → 2.9 GByte/s per node, 261 GByte/s total • Data push architecture → capture at full bandwidth of 2.1 TBit/s • Tasks: – Online Track Reconstruction – Multi-Event Buffering © V. Angelov, F. Rettig VHDL Vorlesung SS2009 13 /48 850nm Transceiver SFP Custom LVDS I/O 72 Pairs Processing Node 3 Parallel Links (240 MHz DDR, 8 Bit LVDS) to SMU from left TMU to right TMU JTAG Data from one Detector Stack CompactPCI Bus DDR2 SRAM: High Bandwidth (28.8 GBit/s) Data Buffer © V. Angelov, F. Rettig Virtex-4 FX100 FPGA VHDL Vorlesung SS2009 14 /48 Configurable Logic Blocks (CLBs) (1) CLB Overview Device Array (3) Row x Col Block RAM Virtex-4 FX Family R Logic Cells Slices Virtex-4RocketIO Family Overview PowerPC Max Max Total Max Processor Distributed XtremeDSP 18 Kb Transceiver Block I/O User The Configurable Logic Blocks (CLBs) are the main Ethernet logic resource for implementing (2) Blocks RAM (Kb) DCMs PMCDs Slices Blocks RAM (Kb) Banks I/O sequential as well as combinatorial circuits. Each CLBMACs element isBlocks connected to a switch matrix to128 access 2,304 to the general matrix in Figure 5-1). 128 XC4VSX25 64 x 40FPGA 23,040 10,240 160 (Continued) 4 routing 0 N/A(shownN/A N/AA CLB element 9 320 Table 1: Virtex-4 Family Members XC4VSX35 XC4VSX55 R contains four interconnected slices. These slices are grouped in pairs. Each pair is 192 192 3,456 SLICEM 8 4 N/Apair of slices N/A in the N/A 11and 448 Block RAM organized as a column. indicates the left column, SLICEL (1) 96 Configurable x 40 34,560 240 Logic 15,360 Blocks (CLBs) 128 x 48 55,296 24,576 designates the pair of slices in the right column. Each pair in a column has an independent 512 PowerPC Max RocketIO Max Total Max 384 320 5,760 8 4 N/A N/A N/A 13 640 carry chain; a common shift chain.I/O User Processorhave Ethernet Distributed XtremeDSP 18 Kbhowever, Transceiver Block only the slices in SLICEM PMCDs 32 (2) Blocks Slices Blocks MACs RAM Blocks (Kb) DCMs Banks I/O 86(Kb) 36 RAM 648 4 0 1 2 N/A 9 320 Device XC4VFX12 Array (3) Row Col 64 xx 24 XC4VSX25 XC4VFX20 64 x 40 36 23,040 19,224 10,240 8,544 160 134 128 number identifies a column4 of slices. counts from9the left 128 2,304 0 The number N/A N/A N/A 320to the 32 68 1,224 1 2 up in sequence 8 XC4VSX35 XC4VFX40 96 x 40 52 34,560 41,904 15,360 18,624 240 291 192 48 XC4VSX55 XC4VFX60 128 x 48 52 55,296 56,880 24,576 384 Models 25,280 395 CLB / Slice Timing XC4VFX12 XC4VFX100 64 xx24 160 68 12,312 94,896 5,472 42,176 86 659 19,224 63,168 8,544 142,128 134 987 Logic Cells 12,312 Slices 5,472 The Xilinx® tools designate slices with the following definitions. An “X” followed by a General Slice Timing Model and Parameters XC4VFX20 XC4VFX140 64 xx36 192 84 A simplified Virtex-4 FPGA slice is shown in Figure 5-20. Some elements of the Virtex-4 FPGA slice are omitted for clarity. Only the elements relevant to the timing paths described XC4VFX40 96 x 52 41,904 18,624 291 in this section are shown. Notes: right. A “Y” followed by a number identifies the position of each slice in a pair as well as 192 The 3,456 N/A N/A 11 448 144 2,592 8 counts 4 slicesN/A 2 4 the bottom 12 in sequence: the CLB row. “Y” number starting from 0, 1, 0, 1 (the first CLB row); 2, 3, 2, 3 (the second CLB row); etc. Figure 5-1 shows the CLB located 512 320 5,760 8 4 N/A N/A N/A 640 in 128 232 4,176 12 8 2 4 16 13 576 the bottom-left corner of the die. Slices X0Y0 and X0Y1 constitute the SLICEM column-pair, 32 36 648X1Y112 4 0 the SLICEL 1 2 9SLICEM 320 160 8 2 4 20 768 and slices376 X1Y0 6,768 and constitute column-pair. ForN/A each CLB,15 indicates the pair of slices labeled with an even number – SLICE(0) or SLICE(2), and 32 68 1,224 4 0 1 2 8 9 320 192 552 9,936 20 8 2 4 24 17 896 SLICEL designates the pair of slices with an odd number – SLICE(1) or SLICE(3). 48 144 2,592 8 4 2 4 1. One CLB = Four Slices = Maximum of 64 bits. SLICEM SLICEL FX 2.XC4VFX60 Each XtremeDSP x 18 multiplier, and an accumulator FXINA MUXFX one 18 128 xslice 52 contains 56,880 25,280 395 an adder,128 232 or Distributed 4,176 RAM 12 or Shift8Register) 2(Logic Only)4 (Logic 3. FXINB Some of the row/column array is used by the processors Yin the FX devices. SHIFTIN COUT 160 XC4VFX100 160 x 68 94,896 42,176 659 376 6,768 12 8 2 4 XC4VFX140 G inputs D LUT 192D x 84 LUT Q 142,128 63,168 FF/LAT FF CE YQ 987 192 CLB 552 9,936 CLK 20 Notes: System Common 1. One CLB = Blocks Four Slices = Maximum of 64 bits. to All Virtex-4 Families SR REV 2. 3. Each XtremeDSP slice contains one 18 x 18 multiplier, an adder, and an accumulator BY Some of the row/column array is used by the processors in the FX devices. Xesium Clock Technology F5 MUXF5 Switch Matrix 8 2 4 (3) SLICE Slice X1Y1 11 448 16 13 576 20 15 768 24 17 896 SLICE (1) Slice X1Y0 500 MHz XtremeDSP Slices COUT Interconnect to Neighbors Dedicated 18-bit x 18-bit multiplier, CIN SLICE (2) multiply-accumulator, or multiply-adder blocks Precision clock deskew and phase shift Slice FF/LAT LUT FF X0Y1 CE • Optional pipeline stages for enhanced performance System Blocks Common Flexible frequency synthesis CLK to All Virtex-4 Families SR REV • Optional 48-bit accumulator for multiply accumulate Dual operating modes to ease performance trade-off SLICE (0) decisions operation Xesium Clock Technology 500(MACC) MHz XtremeDSP Slices Slice X0Y0 BX Improved maximum input/output frequency •• Integrated complex-multiply CE Dedicated adder 18-bit xfor18-bit multiplier, or multiply-add • Up twenty phase Digitalshifting Clock resolution Manager (DCM) modules -CLK to Improved operation multiply-accumulator, or multiply-adder blocksug070_5_01_071504 SHIFTOUT CIN - SR Precision clock deskew Reduced output jitter and phase shift • Cascadeable Multiply or MACC • Figure Optional pipeline stages for enhanced performance - Figure Flexible frequency synthesis 5-20: Simplified Virtex-4 FPGA General SLICEL/SLICEM Low-power operation 5-1: Arrangement of Slices within the CLB • Up to 100% speed improvement over previous Optional 48-bit accumulator for multiply accumulate Dual operating modes to ease performance trade-off EnhancedVirtex-4 phase detectors Timing Parameters Slice Virtex-4 Configurable Logic Block (CLB) decisions generation devices. (MACC) operation - showsWide phase shift rangefor a majority of the paths in Table 5-5 the general slice timing parameters Figure-5-20. Improved maximum input/output frequency • Integrated adder for complex-multiply or multiply-add • Companion Phase-Matched Clock Divider (PMCD) 500 MHz Integrated Block Memory 5: General Slice Timing - Parameters Improved phase shifting resolution operation blocks meter Function 15 /48 © V. Angelov, F. Rettigoutput jitterDescription SS2009 • Up to 10VHDL Mb of Vorlesung integrated block memory Reduced • Cascadeable Multiply or MACC • X Up to twenty Digital Clock Manager (DCM) modules LUT F inputs D D Q XQ UG070_5_20_071504 • 12 Multi-Event Buffering • Allows for significant reduction of detector dead time due to: – Interleaved 3-level trigger sequences – Decoupling of front-end electronics operation from data transmission to data acquisition, 2-stage readout Detector FEE Single-event buffers in each chip L1 Accept GTU L2 Accept or discard Data Acquisition & HLT Multi-event buffers for each stack – Dynamic buffer allocation for strongly varying event sizes – Buffers: 4-MBit SRAMs, 64-bit 200 MHz DDR interface – 12 independent 128 bit wide data streams at 200 MHz © V. Angelov, F. Rettig VHDL Vorlesung SS2009 16 /48 Multi-Event Buffering II • 12 independent data streams via 2.5 GBit/s links, in fabric as 16-bit streams at 125 MHz (net 1.94 GBit/s) • De-randomizing/gap elimination, merging to single dense 128-bit 200 MHz data stream to SRAM (>94% of all clock cycles, 23.3 GBit/s) • Allocation of separate memory regions for each link/ event (12 independent ring buffers, 2 write+1 read pointers) raw data data streams from detector END L0 trigger © V. Angelov, F. Rettig 16 ··· x12 ··· END L1A 16 125 MHz Stream Merger commas 128 128-bit Lines tracklet data ··· 12x ··· 200 MHz SRAM VHDL Vorlesung SS2009 17 /48 Event Buffering Pipeline I 7.3 Der Event Shaper G#;)"#%) A10$).+ 4#%$ 56'%1)* 78++ ? G#;)"#%) A10$)., ?-+ "#%).#%/):.;#;)/<+= G#;)"#%) A10$).F "#%).#%/):.;#;)/<,= ? "#%).#%/):.;#;)/<F= +,> ?-+ ?-+ !"#$%#%$ &'(()* +,> F-+ +,> "#%)./010 +,> "#%)./010.;#;)/<F= +,> )%/."#%).;#;)/<F= @@@ "#%).20"#/ +,-+ )%/."#%) "#%).20"#/.;#;)/<,= "#%).20"#/.;#;)/<F= )2)%1.36B;")1).;#;)/<,= )2)%1.36B;")1).;#;)/<F= # +,: )%/. B0*C)*. 4 *)03D)/. ("0$E 9 +,86* A @@@ "#%).0/20%3) )2)%1./6%) 031#2)."#%CE<#= Abbildung 7.5: Die ersten drei Pipeline-Stufen des Datenpfades im Event Shaper. Die 18 /48 © V. Angelov, F. Rettig VHDL Vorlesung SS2009 grau unterlegten und blau hervorgehobenen Strukturen sind für jeden Kanal separat Event Buffering Pipeline II Pufferung der Ereignisdaten "#$%&'()(&*#*%'+I- 3#*%"#$% 4)(5%&. 7I8 7I8 "#$%&'()(&*#*%'+,"#$%&6("#'&*#*%'+,- "#$%&6("#'& *#*%'+I- 01#)%&'()( 01#)%&%$(2"% ;CC9%)&D"1 1%9%) 7IJ7 G(/&;CC9%)&G()DH : 78 01#)%&(''1%99 78 ;CC9%)&*#*%' ;CC9%) '<("&*;1)&=>?@ 01#)% *;1) : ABBBA ;CC9%)&(''&;$% '()( 78 '()( %$(2"% (''1%99 7I/ # 1%9%) C#19) 1<$ 4 &";5#D > (''1%99 2($EK1 . "#$%&#$'%/& *#*%'+7- . BBB 2($EK0 "#$%&#$'%/&*#*%'+.- . 1%(' *;1) ABBBA "#$%&6("#' *#*%'+.- © V. Angelov, F. Rettig ! 78 >L@ "#$%&#$'%/& *#*%'+I- 2";DE 9#F% > 7IJ7 BBB 2(9%&(''1%99%9 2(9%&(''1%99 SRAM Interface 3#*%"#$% 4)(5%&, ;CC9%)&*(99 "#$E&D;G*"%)%'+#- %$'&"#$% *#*%'+.- "#$%&#$'%/&*#*%'+I- ! "#$%&#$'%/&*#*%'+,- . "#$%&#$'%/& *#*%'+.- Abbildung 7.7: Der zweite Teil des Datenpfades im Event Shaper.SS2009 VHDL Vorlesung 19 /48 Nfrei,wp<rp = rp − wp − 1 (7.2) Event Buffering Pipeline III Der untere Teil von Abbildung 7.13 zeigt die Implementierung der Berechnung zusammen mit den Komponenten des Interfaces zur Read-Out-Einheit. 234,/35, 6)78,9! ,++.+9K97D.+)9?/78- #HI ,C,5)9DE??,+ ,<4)F GGG H ,<4)F #$ .??-,)9434,0 #$ @A@B J, ,C,5) 35?.=3> +, ,C,5)9,509+,70 /35:9;.<4/,),0=3> 7D.+) & L HI & ' ( #$ #$ /35,9350,I 434,0=!> !"# +,7094.35),+- & !"# #$ %"# /.J9<,<.+F ?/78- L & L 1 !"# ! A5),+?7;,9).9M,70&BE)9N53) /35:97;)3C,=3> /35,9350,I 434,0=%> )*+,-*./0- © V. Angelov, F. RettigAbbildung 7.13: Interface zur Read-Out-Einheit und Arithmetik VHDL Vorlesung SS2009 zur Speicherüberlauf- 20 /48 Global Track Matching • 3D track matching: find tracklets belonging to one track • Processing time less than approx. 1.5µs • Integer arithmetics, logic & look-up tables 20° Particle Track Tracklet extrapolated Projection plane Charge Clusters Tracklet - track bendings and tracklet misorientations exaggerated © V. Angelov, F. Rettig VHDL Vorlesung SS2009 21 /48 Global Track Matching II • Projection of tracklets to virtual transverse planes • Intelligent sliding window algorithm: ∆y, ∆αVertex, ∆z • Massively parallel hardware implementation 20° extrapolated tracklets αVertex ∆y Projection plane y © V. Angelov, F. Rettig VHDL Vorlesung SS2009 22 /48 Momentum Reconstruction • • • Assumption: particle origin is at collision point const a Fast cut condition for trigger: const ≤ pt,min · a Estimation of pt from line parameter a: pt = y x a b © V. Angelov, F. Rettig VHDL Vorlesung SS2009 23 /48 An Example... © V. Angelov, F. Rettig VHDL Vorlesung SS2009 24 /48 Online Track Matching I 7.1 Overview of the GTU Trigger Computations Input unit (Layer 2) Input unit (Layer 1) Input unit (Layer 0) Z-channel 1 Z-channel 0 Z-channel 2 Z-channel 0 Track finder Z-channel 2 Z-channel 0 Track finder (Ref. layer 1) (Ref. layer 3) Z-channel 0 Z-channel 1 (Ref. layer 2) Track finder Z-channel 1 Z-channel 2 (Ref. layer 1) (Ref. layer 3) Z-channel 0 Z-channel 1 Track finder Track finder Z-channel 1 Z-channel 2 • Up to 240 track segments/event (Ref. layer 2) Track finder (Ref. layer 2) Track finder (Ref. layer 1) Track parameter memory TMU trigger design To SMU trigger logic (Layer 3) Z-channel 2 Track finder pt Reconstruction unit Input unit Z-channel 0 Merging unit (Layer 4) (Ref. layer 3) Z-channel 2 12 optical links (from detector stack) Input unit Track finder Z-channel 1 Z-channel 1 (Layer 5) Z-channel 2 Z-channel 0 Input unit • 18 matching units running in parallel • Fully pipelined, data push architecture • Fast integer arithmetic and pre-computed look-up tables used • High precision pt reconstruction ∆pt/pt < 2% • 60 MHz clock Figure 7.5: The architecture of the TMU trigger design. The TMU receives the track segment data a module stack, combines it to tracks using several processing stages and © V. Angelov, F. from Rettig VHDL Vorlesung SS2009 25 /48 Input & Track Finder Unit Kombination der Spursegmente zu Spuren 5.3 Eingangseinheit ... Z-Channel Unit (Layer 5) 6 10 addr2 Y 3 idx 6 addr Z-Channel Unit (Layer 0) 8 alpha 6 10 addr2 Y 3 idx 6 addr 8 alpha Track Finder 2 Optical Links from 1 Module 8 y Angle Calc. Unit y Proj. Unit 8 alpha 10 Y y Calc. Unit cnt 4 Z MEM y! 13 addr 40×21 Bit rd 10 Y 4 Z Z-Channel 0,1,2 Units addrA 6 Cnt Reg B 6 Incr 2 inc(0) 6 hit_mask wr rd 6 addrB MEM B0 addr rd Combination Logic enable 6×addr 6× ... 6 6 Merging Unit 6 addr Kombination der Spursegmente zu Spuren Output Register Stage 8 alpha 2 inc(5) MEM A0 Cnt Reg A ... 6× addr P " MEM B5 6 wr ab_sel(0) Z 8 P idx_a(5) Y_a(5) alpha_a(5) 4 Z wr 13 y 7 d Incr rd 6 addrB 6× ... idx_b(0) Y_b(0) alpha_b(0) Buffer / Merging Unit 6 addrA wr addr rd Cnt Reg B idx_a(0) Y_a(0) alpha_a(0) 32 MEM A5 Cnt Reg A addr 32 6 wr ab_sel(5) Input Unit Input Controller 0 idx_b(5) Y_b(5) alpha_b(5) Input Controller 1 6 addr 6 13 8 addr y! P pt Reconst. Unit Abbildung 5.18: Der Aufbau einer Spurfindeeinheit. Es werden parallel die Spursegmentdaten aus allen sechs Ebenen übernommen und in jeweils zwei Speichern abgelegt. Pro Speicherblock existiert Buffer / Merging Unit ein Zählerregister für die Leseadresse. Die Kombinationslogik vergleicht die aus den Speichern 7 13 d y Sie stellt fest, ob die Spursegmente gemeinsam eine Spur bilden und legt in jedem gelesenen Werte. C [12..4] const. Takt die Erhöhung der Zählerregister fest. 9 9 "0" 8 × 18 Angle Calc. Unit 8 [17..10] entscheidet und die Erfüllung der Vereinigungskriterien überprüft. Der Aufbau ist in Ab8 5.3: Der Aufbau einer Eingangseinheit als Blockschaltbild. Die Eingangseinheit empbildung 5.18 cidargestellt. Da aus jeder Datenreihe gleichzeitig zwei aufeinanderfolgende 1 pursegmentdaten von einem Detektormodul und führt die Berechnungen aus, die für + Datenworte betrachtet werden, werden Speicher mit zwei unabhängigen Leseports benö8 nt unabhängig durchgeführt werden können. Abbildung 5.7: Der Aufbau der Berechnungseinheit für tigt. Alternativ wird hier der Aufbau aus zwei unabhängigen Speichern dargestellt, die [7..1] den Ablenkwinkel. Durch Addition der entsprechend der 7 alpha geeignet erhalten skalierten y-Koordinate wird aus jeweils direkt aus den Sorparallel beschrieben werden. Detektorebene Die Speicher ihre Daten der Ablenkung des Spursegments in guter Näherung der Register Stage Ablenkwinkel gegen die Vertexrichtung. tierern der entsprechenden Z-Kanal-Einheiten. Die Schreibadresse kommt für jede Ebene aus einem gemeinsamen Zähler, der für jedes Spursegment um eins erhöht wird. Die Le26 /48 © V. Angelov, F. Rettig VHDL Vorlesung SS2009 PID Signature Pad Row Y Position Deflection Length ist definiert als Reconstruction Unit I • fully pipelined data push architecture • optimized for low latency • High precision pt reconstruction ∆pt/pt < 2.5% • Uses addition, multiplication and pre-computed lookup tables • 60 MHz clock © V. Angelov, F. Rettig VHDL Vorlesung SS2009 27 /48 Linux Operating System 4 GByte mass storage SD Card Controller SDRAM Controller SRAM Ctrl. • Bus components − − − UART PLB · 100 MHz 64 MByte memory PowerPC 400 MHz DCS Board Network Switch Embedded PowerPC System − − EMAC BRAM Bootloader MGT • Two PowerPC Cores: − Linux Operation System: monitoring & control (PetaLinux/Monta Vista) − HW/SW-Codesign (planned): Level-2 trigger calculations, real-time monitoring & control System Backplane L2 trigger data path © V. Angelov, F. Rettig PowerPC 400 MHz Config/Stat L2 trigger DDR2 SDRAM controller UART, Gigabit-Ethernet SD Card controller SRAM controller Configuration & status interface data path VHDL Vorlesung SS2009 28 /48 TMU Design Resource Usage • 38,601 slices occupied (91%) PowerPC Event Buffering – 45,716 logic LUTs (54%) – 53,500 LUTs total (63%) – 29,936 FFs (35%) • 4 DCMs (33%), 1 PMCD (12%), 17 BUFGs (63%) • 165 BRAMs (43%) MGT • 12 MGTs (60%), 9 DSPs (5%), 2 PowerPCs • 345 IOB (56%) • Gate equivalent: 11,625,408 Trigger © V. Angelov, F. Rettig VHDL Vorlesung SS2009 29 /48 TMU Design Resource Usage PowerPC Event Buffering MGT Resource Event Buffering Tracking FF 10,921 8,858 LUT 5,940 24,086 BRAM 14 78 DMEM 19 / 0 93 / 1,128 Embedded PowerPC System: 4,003 FFs, 4,068 LUTs, 69 BRAMs Trigger © V. Angelov, F. Rettig VHDL Vorlesung SS2009 30 /48 VHDL Code Design Part Number of non-blank lines Total 204,445 TMU 86,693 Synthesis 35,458 Event Buffering 15,127 Tracking 41,144 Simulation SMU 10,023 40,371 Synthesis 35,458 Simulation 49,130 TGU 16,878 Common/Shared 60,055 © V. Angelov, F. Rettig VHDL Vorlesung SS2009 31 /48 GTU Tracking Timing • Computation latency depending on number and tracklet content - 550 ns offset, rising only slightly • Total latency depending heavily on number of tracklets e Analysis 9.3 GTU Timing Pe • Full hardware simulation with ModelSim and Testbench 0.6 Normalized count 0.5 0.4 1600 6 12 30 60 120 180 1400 0.3 0.2 0.1 1000 800 600 400 200 0 200 400 600 800 1000 1200 1400 1600 Data transmission and TMU trigger computation time / ns 0 0 248 226 214 202 190 188 166 154 142 130 128 10 96 84 72 60 48 36 30 24 18 12 6 0 1200 Track reconstruction / TMU trigger calculation Segment buffering and synchronization Track segment transmission Plots: PhD thesis by J. de Cuveland Segments per stack: Average processing time / ns 0.7 Segments per stack The total trigger processing time in the TMU is notFigure constant butThe depends heavily processing time in the TMU can be divided into three 9.12: total trigger on the number of received track segments — and thus on average on the multiplicity with different dependencies on the number per /48stack. © V. Angelov, F. Rettig VHDL Vorlesung SS2009of segments32 TRD Beam Test at CERN 1 TRD Supermodule 1 GTU Segment • Accelerator: CERN Proton Synchrotron (PS) • Particles: Electrons, Pions (Transverse Momenta: 0.5 – 6 GeV/c) • Good statistics for detector calibration (More than 1 Mio. events per momentum value) • 8 days of continuous operation • First run with tracklets, consistent with raw data Performance Analysis 700 Mean: -0.0916 cm RMS: 0.119 cm 600 500 400 300 3 2 1 0 100 3 2 1 0 (cm) 200 100 0-1 Tracklets from TRD Count Single Tracklet Deflection Precision deflection error L0 120 x y -0.8 -0.6 -0.4 -0.2 Fri Feb 29 10:56:41 2008 © V. Angelov, F. Rettig 0 0.2 0.4 0.6 0.8 1 / cm Deflection Error /deflcm 80 3 2 1 0 60 3 2 1 0 ADC Count 3 2 1 0 November 2007 Beam Test Setup at CERN Proton Synchrotron 40 3 2 1 0 20 -40 -30 -20 -10 0 10 20 30 40 0 (not to Figure 9.2: Display of a single event in a TRD stack. The color represents the measured ADC count scale) for a given position. The red lines depict the parametrized track segments as received 33is/48 VHDL Vorlesung SS2009 by the GTU for the same event. The width of the drift chambers exaggerated to Simplified Event © V. Angelov, F. Rettig VHDL Vorlesung SS2009 34 /48 Simplified Event II © V. Angelov, F. Rettig VHDL Vorlesung SS2009 35 /48 Simplified Event III © V. Angelov, F. Rettig VHDL Vorlesung SS2009 36 /48 Realistic Pb-Pb Event © V. Angelov, F. Rettig VHDL Vorlesung SS2009 37 /48 Concentrator Node • Inputs: reconstructed tracks from first tier & raw data • Tasks: – Apply trigger schemes – Interface to data acquisition system, process trigger sequences and read-out © V. Angelov, F. Rettig VHDL Vorlesung SS2009 38 /48 Concentrator Node SFP modules 1000Base-SX to switches DDR2 SDRAM 64 MByte Link to ALICE DAQ system © V. Angelov, F. Rettig From TMU 0 Custom LVDS I/O 72 Pairs SD Card Slot 4 GByte SDHC Cards To TGU JTAG From TMU 1 From TMU 2 From TMU 3 From TMU 4 5 Parallel Links (240 MHz DDR, 8 Bit LVDS) CompactPCI Bus Interface to ALICE TTC system VHDL Vorlesung SS2009 39 /48 FSM Example: Trigger Handling reset L1_window_ok Idle L0 L1_wi Tr_flush L0 Erroneous L0 restarts L1 window timer L1_window_ok L1_wv L0 L1 L2_wi L1 Assume L0 to be correct Erroneous L1 restarts L2 window timer L2_window_ok L2_wv L2a L2_window_ok L2r SoD_EoD_dec L2r_rcv SoD + EoD SoD_EoD L2a_rcv eoe eoe no_error + eoe L2_timeout eoe to Idle © V. Angelov, F. Rettig VHDL Vorlesung SS2009 40 /48 Trigger Schemes • Cosmic Trigger • Jet Trigger – Simple jet definition: more than certain number of high-pt tracks through a given detector volume – Additional conditions: jet location, coincidences, ... – Ntracks=1: single high-pt particle trigger • Di-Lepton Decay Trigger – Coincidence of high-pt e± tracks – Calculation of invariant mass for higher selectivity • Various Other Schemes – Ultra-peripheral collisions © V. Angelov, F. Rettig VHDL Vorlesung SS2009 41 /48 Cosmics Trigger • Chamber: min ≤ sum of charge/hits ≤ max • Stack: min ≤ chambers hit ≤ max • Supermodule: min ≤ stacks hit ≤ max • Detector: coincidence between supermodules C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C trg © V. Angelov, F. Rettig trg trg trg trg VHDL Vorlesung SS2009 trg 42 /48 Cosmic Event Triggered © V. Angelov, F. Rettig VHDL Vorlesung SS2009 43 /48 !4 super-modules were commissioned successfully last year with !ALICE global cosmic runs performed (Dec. 2007, Feb. 2008, and (last one continued as LHC run). K. Oyama, Uni Heidelberg Cosmic Event Triggered II A cos recor ALIC and T ©Mar. V. Angelov, F. Rettig 13, 2009 Vorlesung SS2009 TIPP09,VHDL K. Oyama 44 /48 Jet Trigger • Identify tracks with pt ≥ pt,threshold within certain region • Threshold conditions: • Number of tracks • Sum of momenta of tracks "#$%&'()*%"+,-./0/1-%2,3/,0/1-%4$0$501+%,.%6$0%"+/77$+ C/5/$-5D ! "(#)*%++%,%-.%/"0 .%6%-7.%8"09: p-p @ 10 TeV pt-hard 86-180 GeV • Granularity: sub-stack-sized 1234%7<%6%-7.%8"09: B. Bathen, Uni Münster areas overlapping in z- and Φdirection • Realizable at first trigger stage • Multi-Jet coincidence at top level !"#$%&' "248"+/77$+%9$$0/-7:%;<=;;=>??@ © V. Angelov, F. Rettig VHDL Vorlesung SS2009 45 /48 Di-Lepton Trigger • Find e+e- pairs with invariant mass within certain range (J/ψ, ϒ, ...) • Huge combinatorics for Pb-Pb collisions • Current work: – Pre-selection of track candidates, application of sliding window algorithms – Massively parallelized invariant mass calculation in FPGA hardware – Fast trigger contribution for Level-1 (after 6µs) more elaborate decision for Level-2 (80µs) © V. Angelov, F. Rettig VHDL Vorlesung SS2009 46 /48 Waiting For LHC Start-Up LHC restart scheduled for end of year... © V. Angelov, F. Rettig VHDL Vorlesung SS2009 47 /48 The GTU People Venelin Angelov, Jan de Cuveland, Stefan Kirsch, and Felix Rettig Former members: Thomas Gerlach, Marcel Schuh Prof. Volker Lindenstruth Chair of Computer Science Kirchhoff Institute of Physics University of Heidelberg Germany http://www.ti.uni-hd.de © V. Angelov, F. Rettig VHDL Vorlesung SS2009 48 /48