GPU - Agenda
Transcription
GPU - Agenda
GPUs forStatisticalDataAnalysisinHEP: aperformancestudyofGooFit onGPUsvs RooFit onCPUs AdrianoDiFlorio UNIVERSITA’DEGLISTUDIDIBARI“ALDOMORO”&I.N.F.N.SEZIONEDIBARI CCRMeeting&GiuliaFinzi Symposium - 5th July- Rome 1/20 Outline Introductionto GPUcomputing & GooFit Pseudo-experimentsforp-valueestimation: GooFit vs RooFit performancestudy ExploringtheapplicabilitylimitsofWilks theorem Summary& Outlook CCRMeeting/ July5th Adriano DiFlorio (BariUniversity&INFN) Introduction:GPUcomputing&GooFit CCRMeeting/ July5th Adriano DiFlorio (BariUniversity&INFN) WhyGPUcomputing?Moore’sLaw Physical limit:heat dissipation 𝑃 = C×𝑉×𝑓 ( V– working tension C– capacity f – clockfrequency Futuredevelopments cannot rely anymore onan exponential growth of frequency Anewapproach is needed: apossible solution is GPU-computing. CCRMeeting/ July5th Adriano DiFlorio (BariUniversity&INFN) 1/24 GPUs’architecture “If you were plowing a field, which would you rather use: Two strong oxen or 1024 chickens?” Seymour Cray CCRMeeting/ July5th Adriano DiFlorio (BariUniversity&INFN) 2/24 GPUs’architecture “If you were plowing a field, which would you rather use: Two strong oxen or 1024 chickens?” Seymour Cray We definetely choose the chickens CCRMeeting/ July5th Adriano DiFlorio (BariUniversity&INFN) 2/24 GPUs’architecture GPU CPU What is a GPU? Graphic Processing Unit CCRMeeting/ July5th Adriano DiFlorio (BariUniversity&INFN) 3/24 GPUs’architecture What is a GPU? Graphic Processing Unit 1970s: first graphical user interface produced requiring dedicated microchips GPU CPU Video games and 3D graphics: strong economic stimulus for GPU development CCRMeeting/ July5th Adriano DiFlorio (BariUniversity&INFN) 3/24 GPUs’architecture What is a GPU? Graphic Processing Unit 1970s: first graphical user interface produced requiring dedicated microchips CPU Video games and 3D graphics: strong economic stimulus for GPU development Consequences on GPU architecture: GPU Thousands of cores CCRMeeting/ July5th Adriano DiFlorio (BariUniversity&INFN) 3/24 GPUs’architecture What is a GPU? Graphic Processing Unit 1970s: first graphical user interface produced requiring dedicated microchips CPU Video games and 3D graphics: strong economic stimulus for GPU development Consequences on GPU architecture: Thousands of cores GPU Big loads of data CCRMeeting/ July5th Adriano DiFlorio (BariUniversity&INFN) 3/24 GPUs’architecture What is a GPU? Graphic Processing Unit 1970s: first graphical user interface produced requiring dedicated microchips CPU Video games and 3D graphics: strong economic stimulus for GPU development Consequences on GPU architecture: Thousands of cores GPU Big loads of data CCRMeeting/ July5th Low frequency clock (~1GHz) Adriano DiFlorio (BariUniversity&INFN) 3/24 GPUs’architecture What is a GPU? Graphic Processing Unit 1970s: first graphical user interface produced requiring dedicated microchips CPU Video games and 3D graphics: strong economic stimulus for GPU development Consequences on GPU architecture: Thousands of cores GPU Big loads of data CCRMeeting/ July5th Low frequency clock (~1GHz) Arithmetical operations in a single clock cycle (sin,cos,sqrt,1/x, …) Adriano DiFlorio (BariUniversity&INFN) 3/24 GPUs’architecture What is a GPU? Graphic Processing Unit 1970s: first graphical user interface produced requiring dedicated microchips CPU Video games and 3D graphics: strong economic stimulus for GPU development Consequences on GPU architecture: Thousands of cores GPU Big loads of data CCRMeeting/ July5th Low frequency clock (~1GHz) Arithmetical operations in a single clock cycle (sin,cos,sqrt,1/x, …) Adriano DiFlorio (BariUniversity&INFN) 3/24 IntroductiontoGPU-acceleratedcomputing Hetherogeneous GPU-acccelerated computingistheuseofaGraphics ProcessingUnittoacceleratescientific applications(amongotherapps). Enhancementofapplication performanceobtainedby offloadingcompute-intensive portionstotheGPU(the device)whiletheremainder ofthecodestillrunsonthe CPUs(thehost). CCRMeeting/ July5th GPU ApplicationCode Sequential portion Compute intensive portion CPU Adriano DiFlorio (BariUniversity&INFN) 4/24 IntroductiontoGPU-acceleratedcomputing Hetherogeneous GPU-acccelerated computingistheuseofaGraphics ProcessingUnittoacceleratescientific applications(amongotherapps). Enhancementofapplication performanceobtainedby offloadingcompute-intensive portionstotheGPU(the device)whiletheremainder ofthecodestillrunsonthe CPUs(thehost). GPU ApplicationCode Sequential portion Compute intensive portion CPU Fromtheuser’sperspective?Applicationssimplyrunsignificantlyfaster! Howmuchfaster?Itdepends- ofcourse- ontheapplication… Wewanttoexploreitinthecontextofthe‘end-userHEPanalyses’ byusingGooFit. CCRMeeting/ July5th Adriano DiFlorio (BariUniversity&INFN) 4/24 GooFit framework GooFit isadataanalysistool forHEP,thatinterfacesROOT/RooFit toCUDA parallel computingplatformonnVidia GPU.ItalsosupportsOpenMP. Control&DataFlowofaGooFit program [Device side] GooFit: a library for massively parallelising maximum-likelihood fits R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003 ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 5/24 BACKUP-1 GooFit framework GooFit isadataanalysistool forHEP,thatinterfacesROOT/RooFit toCUDA parallel computingplatformonnVidia GPU.ItalsosupportsOpenMP. AGooFit programhas4maincomponents: Control&DataFlowofaGooFit program [Device side] GooFit: a library for massively parallelising maximum-likelihood fits R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003 Itisanopensourceproject,underdevelopmentandfundedbyUSNSF. ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 5/24 BACKUP-1 GooFit framework GooFit isadataanalysistool forHEP,thatinterfacesROOT/RooFit toCUDA parallel computingplatformonnVidia GPU.ItalsosupportsOpenMP. AGooFit programhas4maincomponents: Control&DataFlowofaGooFit program aGooPdf objectrepresentingthe PDFmodellingthephysicalprocess [Device side] GooFit: a library for massively parallelising maximum-likelihood fits R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003 Itisanopensourceproject,underdevelopmentandfundedbyUSNSF. ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 5/24 GooFit framework GooFit isadataanalysistool forHEP,thatinterfacesROOT/RooFit toCUDA parallel computingplatformonnVidia GPU.ItalsosupportsOpenMP. AGooFit programhas4maincomponents: Control&DataFlowofaGooFit program aGooPdf objectrepresentingthe PDFmodellingthephysicalprocess [Device side] thefitparameters(Variables objectscontainedintheGooPdf ) thedata (DataSet object) GooFit: a library for massively parallelising maximum-likelihood fits R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003 Itisanopensourceproject,underdevelopmentandfundedbyUSNSF. ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 5/24 BACKUP-1 GooFit framework GooFit isadataanalysistool forHEP,thatinterfacesROOT/RooFit toCUDA parallel computingplatformonnVidia GPU.ItalsosupportsOpenMP. AGooFit programhas4maincomponents: Control&DataFlowofaGooFit program aGooPdf objectrepresentingthe PDFmodellingthephysicalprocess [Device side] thefitparameters(Variables objectscontainedintheGooPdf ) thedata (DataSet object) a FitManager objectformingthe interfacebetweenMINUIT andthe GooPdf ACAT-2016 /January18 CCRMeeting/ July5th th GooFit: a library for massively parallelising maximum-likelihood fits R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003 AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 5/24 GooFit framework GooFit isadataanalysistool forHEP,thatinterfacesROOT/RooFit toCUDA parallel computingplatformonnVidia GPU.ItalsosupportsOpenMP. AGooFit programhas4maincomponents: Control&DataFlowofaGooFit program aGooPdf objectrepresentingthe PDFmodellingthephysicalprocess [Device side] thefitparameters(Variables objectscontainedintheGooPdf ) thedata (DataSet object) a FitManager objectformingthe interfacebetweenMINUIT andthe GooPdf GooFit: a library for massively parallelising maximum-likelihood fits R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003 Itisanopensourceproject,underdevelopmentandfundedbyUSNSF. ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 5/24 BACKUP-1 GooFit framework TheFitManager objectformstheinterfacebetweenMINUIT(runningonCPU) andaGPUwhichallowsaPDFrepresentingthephysicalmodel(GooPdf object) tobeevaluatedinparallel. Control&DataFlowofaGooFit program GooFit: a library for massively parallelising maximum-likelihood fits R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003 ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 6/24 GooFit framework TheFitManager objectformstheinterfacebetweenMINUIT(runningonCPU) andaGPUwhichallowsaPDFrepresentingthephysicalmodel(GooPdf object) tobeevaluatedinparallel. Control&DataFlowofaGooFit program Fitparametersareestimatedateach NegLogLikelihood minimizationstep onthehostside(CPU)whilethePDF/NLL isevaluatedonthedeviceside(GPU) [allthatuntilconvergence]: CPU fitparams tuning GPU [memory transfers] PDF/NNL evaluation GooFit: a library for massively parallelising maximum-likelihood fits R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003 ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 6/24 GooFit framework TheFitManager objectformstheinterfacebetweenMINUIT(runningonCPU) andaGPUwhichallowsaPDFrepresentingthephysicalmodel(GooPdf object) tobeevaluatedinparallel. Control&DataFlowofaGooFit program Fitparametersareestimatedateach NegLogLikelihood minimizationstep onthehostside(CPU)whilethePDF/NLL isevaluatedonthedeviceside(GPU) [allthatuntilconvergence]: CPU fitparams tuning GPU [memory transfers] PDF/NNL evaluation GooFit: a library for massively parallelising maximum-likelihood fits R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003 Thiscanbeseenbyanalysing acyclewiththemonitoringtoolnVIDIA VisualProfiler[nvvp] ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 6/24 GooFit framework TheFitManager objectformstheinterfacebetweenMINUIT(runningonCPU) andaGPUwhichallowsaPDFrepresentingthephysicalmodel(GooPdf object) tobeevaluatedinparallel. Control&DataFlowofaGooFit program Fitparametersareestimatedateach NegLogLikelihood minimizationstep onthehostside(CPU)whilethePDF/NLL isevaluatedonthedeviceside(GPU) [allthatuntilconvergence]: CPU fitparams tuning GPU [memory transfers] PDF/NNL evaluation GooFit: a library for massively parallelising maximum-likelihood fits R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003 Thiscanbeseenbyanalysing acyclewiththemonitoringtoolnVIDIA VisualProfiler[nvvp] TheFitControl objectallowstoswitchbetween χ2 fits& MLfits(eitherunbinned &binned). ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 6/24 GooFit profiling Example of a snapshot of the profile of a GooFit process provided by Nvidia Visual Profiler : Memory transfers GPU calculation processes GPU :p.d.f. evaluation CCRMeeting/ July5th Fit parameters exchange between CPU and GPU Adriano DiFlorio (BariUniversity&INFN) CPU : Parameterstuningto minimise Neg-Log-Likelihood 7/24 GooFit profiling Example of a snapshot of the profile of a GooFit process provided by Nvidia Visual Profiler : Memory transfers GPU calculation processes GPU :p.d.f. evaluation CCRMeeting/ July5th Fit parameters exchange between CPU and GPU Adriano DiFlorio (BariUniversity&INFN) CPU : Parameterstuningto minimise Neg-Log-Likelihood 7/24 GooFit profiling Example of a snapshot of the profile of a GooFit process provided by Nvidia Visual Profiler : Memory transfers GPU calculation processes GPU :p.d.f. evaluation CCRMeeting/ July5th Fit parameters exchange between CPU and GPU Adriano DiFlorio (BariUniversity&INFN) CPU : Parameterstuningto minimise Neg-Log-Likelihood 7/24 GooFit profiling Example of a snapshot of the profile of a GooFit process provided by Nvidia Visual Profiler : Memory transfers GPU calculation processes GPU :p.d.f. evaluation CCRMeeting/ July5th Fit parameters exchange between CPU and GPU Adriano DiFlorio (BariUniversity&INFN) CPU : Parameterstuningto minimise Neg-Log-Likelihood 7/24 GooFit profiling Example of a snapshot of the profile of a GooFit process provided by Nvidia Visual Profiler : Memory transfers GPU calculation processes GPU :p.d.f. evaluation CCRMeeting/ July5th Fit parameters exchange between CPU and GPU Adriano DiFlorio (BariUniversity&INFN) CPU : Parameterstuningto minimise Neg-Log-Likelihood 7/24 GooFit profiling Example of a snapshot of the profile of a GooFit process provided by Nvidia Visual Profiler : Memory transfers GPU calculation processes GPU :p.d.f. evaluation CCRMeeting/ July5th Fit parameters exchange between CPU and GPU Adriano DiFlorio (BariUniversity&INFN) CPU : Parameterstuningto minimise Neg-Log-Likelihood 7/24 GooFit profiling Example of a snapshot of the profile of a GooFit process provided by Nvidia Visual Profiler : Memory transfers GPU calculation processes GPU :p.d.f. evaluation CCRMeeting/ July5th Fit parameters exchange between CPU and GPU Adriano DiFlorio (BariUniversity&INFN) CPU : Parameterstuningto minimise Neg-Log-Likelihood 7/24 ApreliminaryexampleofGooFit/GPUscapabilities Parameterestimationisacrucialpartofmanyphysicsanalyses. PDFevaluationonlargedatasetsisusuallythebottleneckintheMINUITalgorithm. GooFit actsasaninterfacebetweentheMINUITminimizationalgorithmandaparallel processorwhichallowsaProbabilityDensityFunctiontobeevaluatedinparallel. CCRMeeting/ July5th Adriano DiFlorio (BariUniversity&INFN) 8/24 ApreliminaryexampleofGooFit/GPUscapabilities Parameterestimationisacrucialpartofmanyphysicsanalyses. PDFevaluationonlargedatasetsisusuallythebottleneckintheMINUITalgorithm. Apreliminarytestwasdonewithan Unbinned MLfit eitherbyusingasingle CPUandbyusinganadditionalGPU (an nVIDIA TeslaC2070hosted@BariT2). EventsaccordingtoaVoigtian model (convolutionisCPU-intensive)aregenerated&fitted.Thetimeneeded(thenegligible generationtimeisnotincluded) isstudiedasafunctionofthe#events: ACAT-2016 /January18 CCRMeeting/ July5th th Time[s] GooFit actsasaninterfacebetweentheMINUITminimizationalgorithmandaparallel processorwhichallowsaProbabilityDensityFunctiontobeevaluatedinparallel. AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) #events 3/20 8/24 ApreliminaryexampleofGooFit/GPUscapabilities Parameterestimationisacrucialpartofmanyphysicsanalyses. PDFevaluationonlargedatasetsisusuallythebottleneckintheMINUITalgorithm. Apreliminarytestwasdonewithan Unbinned MLfit eitherbyusingasingle CPUandbyusinganadditionalGPU (an nVIDIA TeslaC2070hosted@BariT2). EventsaccordingtoaVoigtian model (convolutionisCPU-intensive)aregenerated&fitted.Thetimeneeded(thenegligible generationtimeisnotincluded) isstudiedasafunctionofthe#events: Time[s] GooFit actsasaninterfacebetweentheMINUITminimizationalgorithmandaparallel processorwhichallowsaProbabilityDensityFunctiontobeevaluatedinparallel. #events For10M events:RooFit needs61h+23m&GooFit takes4m+39s:speed-up~ 750 For1MfittedeventswithRooFit …youneedtowaitovernight, For10MfittedeventswithGooFit …youneedtotakeanespresso! ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 8/24 MCtoysforp-valueestimation:GooFit vs RooFit ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) Testapplication:thePhysicscase TotestthecomputingcapabilitiesofGPUswithrespecttoCPUcores:ahigh-statisticstoyMonte Carlotechnique hasbeenimplementedbothinROOT/RooFit andGooFit frameworkswiththeaimto estimatethe(local)statisticalsignificanceofthestructureobservedbyCMSclosetothekinematical boundaryoftheJ ψ φ invariantmassinthe3-bodydecayB + → J ψ φ K + [PLB734(2014)261] Δm = m(µ +µ − K + K − ) − m(µ +µ − ) [GeV ] 2480 ±160 B± ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 5/20 9/24 Testapplication:thePhysicscase TotestthecomputingcapabilitiesofGPUswithrespecttoCPUcores:ahigh-statisticstoyMonte Carlotechnique hasbeenimplementedbothinROOT/RooFit andGooFit frameworkswiththeaimto estimatethe(local)statisticalsignificanceofthestructureobservedbyCMSclosetothekinematical boundaryoftheJ ψ φ invariantmassinthe3-bodydecayB + → J ψ φ K + [PLB734(2014)261] Δm = m(µ +µ − K + K − ) − m(µ +µ − ) [GeV ] 2480 ±160 B± ACAT-2016 /January18 CCRMeeting/ July5th th Structureparameters[compatiblewithY(4140)byCDF]: m = 4148.0 ± 2.4(stat.) ± 6.3(syst.) MeV Γ = 28+15 −11 (stat.) ±19(syst.) MeV AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 5/20 9/24 Testapplication:thetoyMCmethod MCpseudo-experimentsareusedtoestimatetheprobability(p-value)thatbackground fluctuationswould- alone- giverisetoasignalasmuchsignificantasthatseeninthedata. ToyMCfitcycle(foreachgeneratedfluctuation) Generationoffluctuatedbackgroundbinneddistribution(3-bodyphase-spacemodel) [total#entriesfixedbydata fitswithnot-extendedML] NullHypothesisbinnedMLfitperformedwiththephase-spacemodelonly ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 10/24 6/20 Testapplication:thetoyMCmethod MCpseudo-experimentsareusedtoestimatetheprobability(p-value)thatbackground fluctuationswould- alone- giverisetoasignalasmuchsignificantasthatseeninthedata. ToyMCfitcycle(foreachgeneratedfluctuation) Generationoffluctuatedbackgroundbinneddistribution(3-bodyphase-spacemodel) [total#entriesfixedbydata fitswithnot-extendedML] NullHypothesisbinnedMLfitperformedwiththephase-spacemodelonly AlternativeHypothesis binnedMLfitperformedwiththephase-spacemodel+Voigtian PDF [thelatteristruncated tocorrectlyaccountforthekinematicalthreshold;the Gaussianresolutionfunctionhaswidthfixed@2MeV].Signalyieldconstrained>0. Note:foreachbin,thePDFvalueisestimatedbyROOTintegrationoverthebin [time-consumingbutneeded:steepsignalw.r.t.binsize] Fitperformed8timeswithintheregionofinterest(fromCDF: noLEE)tryingdifferentstartingvalues (2masses &4widths). ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 10/24 6/20 Testapplication:thetoyMCmethod MCpseudo-experimentsareusedtoestimatetheprobability(p-value)thatbackground fluctuationswould- alone- giverisetoasignalasmuchsignificantasthatseeninthedata. ToyMCfitcycle(foreachgeneratedfluctuation) Generationoffluctuatedbackgroundbinneddistribution(3-bodyphase-spacemodel) [total#entriesfixedbydata fitswithnot-extendedML] NullHypothesisbinnedMLfitperformedwiththephase-spacemodelonly AlternativeHypothesis binnedMLfitperformedwiththephase-spacemodel+Voigtian PDF [thelatteristruncated tocorrectlyaccountforthekinematicalthreshold;the Gaussianresolutionfunctionhaswidthfixed@2MeV].Signalyieldconstrained>0. Note:foreachbin,thePDFvalueisestimatedbyROOTintegrationoverthebin [time-consumingbutneeded:steepsignalw.r.t.binsize] Fitperformed8timeswithintheregionofinterest(fromCDF: noLEE)tryingdifferentstartingvalues (2masses &4widths). ForeachfitcalculateaΔχ2 w.r.t.theNullHypothesis fit; thebestΔχ2 fitamongthe8alternativefitsischosen! AΔχ2(ourteststatistic)distribution isobtainedoverthe sampleofMCtoys. ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) Δχ 2 ≅ 31.44 Δm [GeV ] 10/24 6/20 Hardwareset-up Used:1serverhosting2 nVIDIA TeslaK20 Tesla K20 @ BC2S & 1serverhosting1 nVIDIA TeslaK40 (*) GPU Tesla K40 @ ReCaS (*) GPU Numero of GPU 2 x GK110 Numero of GPU Number of CUDA cores 2 x 2,496 Number of CUDA cores 2,880 Memory per GPU (GDDR5) 12 GB Memory per GPU (GDDR5) Memory bandwidth per board 2 x 5 GB 208 Gbytes/sec CPU • • Memory bandwidth per board 1 x GK110B 288 Gbytes/sec CPU 16 cores : E5-2640 v2 @ 2.00GHz (32 with HT) 64 GB RAM • • 20 cores : E5-2640 v2 @ 1.70GHz (40 with HT) 256 GB RAM (*) http://www.recas-bari.it ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 11/24 PerformanceofGooFit vsROOT/RooFit :apreliminaryresult AfirstresultobtainedissimplecomparisonbetweentheMCToysprocedurerunningonasingleGPU viaGooFit andonasingleCPU .Thespeedups: S=62(TeslaK40) S=48(TeslaK20) For15kMCToysproduced(HighlytimeconsumingforROOT:~6days) ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 12/24 PerformanceofGooFit vsROOT/RooFit :apreliminaryresult AfirstresultobtainedissimplecomparisonbetweentheMCToysprocedurerunningonasingleGPU viaGooFit andonasingleCPU .Thespeedups: S=62(TeslaK40) S=48(TeslaK20) For15kMCToysproduced(HighlytimeconsumingforROOT:~6days) Thiskindofapplication(binnedfit&fewparameters)doesn’texploitthewholeGPUcomputational capability. Examplesnapshotof nvidia-smi (nvidia monitoringtool– top) forasingleprocess. 66% ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 12/24 PerformanceofGooFit vsROOT/RooFit :apreliminaryresult AfirstresultobtainedissimplecomparisonbetweentheMCToysprocedurerunningonasingleGPU viaGooFit andonasingleCPU .Thespeedups: S=62(TeslaK40) S=48(TeslaK20) For15kMCToysproduced(HighlytimeconsumingforROOT:~6days) Thiskindofapplication(binnedfit&fewparameters)doesn’texploitthewholeGPUcomputational capability. Examplesnapshotof nvidia-smi (nvidia monitoringtool– top) forasingleprocess. 66% HowtoexploitthefullcomputationalpowerofaGPU? ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 12/24 8/20 nVidia MultiProcessServer ThenVidia MultiProcessServer(MPS)isatooldeveloped bynVidia thatallowstoexecute multiple processes(upto16)onthesameGPUchip.Itactsasascheduler :managesthe accesstomemory and CUDAcores. HereisanexampleofhowitaffectstheoccupancyofaTeslaK40GPU: ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 13/24 8/20 nVidia MultiProcessServer ThenVidia MultiProcessServer(MPS)isatooldeveloped bynVidia thatallowstoexecute multiple processes(upto16)onthesameGPUchip.Itactsasascheduler :managesthe accesstomemory and CUDAcores. HereisanexampleofhowitaffectstheoccupancyofaTeslaK40GPU: ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 13/24 8/20 PerformanceofGooFit onnVIDIA MultiProcessServer ThenVidia MultiProcessServer(MPS)isatooldeveloped bynVidia thatallowstoexecute multiple processes(upto16)onthesameGPUchip.Itactsasascheduler :managesthe accesstomemory and CUDAcores. Each processuses: - 1(shared)GPUand1(exclusivelyassigned)CPU Thereisasaturationeffect (Amdhal’s law) 16 15 5000 Toys Tesla K20 15000 Toys Tesla K20 5000 Toys Tesla K40 15000 Toys Tesla K40 14 13 12 Speed Up 11 10 9 8 7 6 MPS 0<N≤16 5 S 4 3 2 1 = T1 N 1 Tn ∑ N n=1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # indipendent concurrent processes per single GPU ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 14/24 8/20 PerformanceofGooFit onnVIDIA MultiProcessServer ThenVidia MultiProcessServer(MPS)isatooldeveloped bynVidia thatallowstoexecute multiple processes(upto16)onthesameGPUchip.Itactsasascheduler :managesthe accesstomemory and CUDAcores. Each processuses: - 1(shared)GPUand1(exclusivelyassigned)CPU Thereisasaturationeffect (Amdhal’s law) 1st(2nd)groupofprocessesassignedto… 0 < N ≤16 …1st(2nd)GPU(the2GPUsTK20onthesameserver hosting32CPUs viaHyperThreading ) 16 5000 Toys Tesla K20 15000 Toys Tesla K20 5000 Toys Tesla K40 15000 Toys Tesla K40 14 13 12 11 Speed Up 100 10 9 8 7 6 MPS 0<N≤16 5 S 4 3 2 1 = T1 N 1 Tn ∑ N n=1 # MC Toys in 1 h (Thousands) 15 2 GPUs vs 1 GPU 90 80 70 60 50 40 30 1 GPU 20 2 GPUs 10 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # indipendent concurrent processes per single GPU ACAT-2016 /January18 CCRMeeting/ July5th th 0 1 2 3 4 5 6 7 8 9 1011121314151617181920 # indipendent concurrent processes per single GPU AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 14/24 8/20 PerformanceofGooFit onnVIDIA MultiProcessServer ThenVidia MultiProcessServer(MPS)isatooldeveloped bynVidia thatallowstoexecute multiple processes(upto16)onthesameGPUchip.Itactsasascheduler :managesthe accesstomemory and CUDAcores. Each processuses: - 1(shared)GPUand1(exclusivelyassigned)CPU Thereisasaturationeffect (Amdhal’s law) 1st(2nd)groupofprocessesassignedto… 0 < N ≤16 …1st(2nd)GPU(the2GPUsTK20onthesameserver hosting32CPUs viaHyperThreading ) 2 GPUs vs 1 GPU 16 13 12 11 Speed Up 3,0 5000 Toys Tesla K20 15000 Toys Tesla K20 5000 Toys Tesla K40 15000 Toys Tesla K40 14 2,9 2,7 2,6 10 9 8 7 6 MPS 0<N≤16 5 S 4 3 2 1 = T1 N 1 Tn ∑ N n=1 2 GPUs / GPU ratio 15 2 GPU / 1 GPU 2,4 2,3 2,1 2,0 1,9 1,7 1,6 1,4 1,3 1,1 1,0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # indipendent concurrent processes per single GPU ACAT-2016 /January18 CCRMeeting/ July5th th 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # indipendent concurrent processes per single GPU AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 14/24 8/20 PerformanceofGooFit onnVIDIA MultiProcessServer Toefficiently runRooFit MCtoysinparallelonthe72CPUsavailableonthe2servers hostingtheGPUs,weusePROOF-LitethatisadedicatedversionofPROOFoptimized forsinglemulti-coremachines[*]. ThisROOT/RooFit extensionimplementsa2-Tierarchitecture withthemastermerged intotheclient,controllingdirectlytheworkers(workersareprocessesnotthreads). PROOFhasaPullarchitecture:allworkersendat thesametimeavoidinglongtales,unavoidable byrunningRooFit onaclusterinPushapproach (thelastjobdeterminesthetotalexec.time). [*] G.Ganis et al., PoS ACAT08 (2008) 007; ACAT-2016 /January18 CCRMeeting/ July5th th A.Pompili et al., J. Phys.: Conf. Ser. 396 032043, CHEP12, 2012 AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 15/24 9/20 PerformanceofRooFit onCPUswith PROOF-Lite Toefficiently runRooFit MCtoysinparallelonthe72CPUsavailableonthe2servers hostingtheGPUs,weusePROOF-LitethatisadedicatedversionofPROOFoptimized forsinglemulti-coremachines[*]. ThisROOT/RooFit extensionimplementsa2-Tierarchitecture withthemastermerged intotheclient,controllingdirectlytheworkers(workersareprocessesnotthreads). PROOFhasaPullarchitecture:allworkersendat thesametimeavoidinglongqueues,unavoidable byrunningRooFit onaclusterinPushapproach (thelastjobdeterminesthetotalexec.time). Checkspeedupperformanceon2servers: PROOF−Lite S0<n≤32(40) = - serverhostingTK20has32CPUs - serverhostingTK40has40CPUs T1 Tn Good scalingwith#ofMCtoys No differencebetween2servers(asexpected) Speedupperfectlyscalingtill ~8workers; thenthereisasaturationeffect(Amdhal’s law) [*] G.Ganis et al., PoS ACAT08 (2008) 007; ACAT-2016 /January18 CCRMeeting/ July5th th A.Pompili et al., J. Phys.: Conf. Ser. 396 032043, CHEP12, 2012 AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 15/24 9/20 Performancecomparison: RooFit/PROOF-Lite vs GooFit/MPS - I Afirst performances’comparisoncanbecarriedoutontheserverhosting32CPUsand2GPUsTK20 asafunctionofthe#ofpseudo-experimentsproduced. Wecancompare:- 1PROOF-Litejobusing30workers(on30CPUcores) with:- 2GooFit/MPSjobs(eachonerunning15simultaneousprocesses) Sn=30=N 2−TK 20 = GooFit TN=30 2−TK 20 Speedups ~45 RooFit Tn=30 #ofprocessedMCtoys(perapplication) ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 16/24 10/20 Performancecomparison: RooFit/PROOF-Lite vs GooFit/MPS - I Afirst performances’comparisoncanbecarriedoutontheserverhosting32CPUsand2GPUsTK20 asafunctionofthe#ofpseudo-experimentsproduced. Wecancompare:- 1PROOF-Litejobusing30workers(on30CPUcores) with:- 2GooFit/MPSjobs(eachonerunning15simultaneousprocesses) ~45 Sn=30=N 2−TK 20 = RooFit Tn=30 GooFit TN=30 2−TK 20 Speedups Good scalingwithextended# ofMCtoys: PROOF−Lite Sn=30 1PROOF-Litejobusing30workers ~20 ~9 MPS−TK 20 SN=15 #ofprocessedMCtoys(perapplication) ACAT-2016 /January18 CCRMeeting/ July5th th VS 1RooFit jobusing1CPU 1GooFit/MPSjob (running15simultaneousprocesses) VS 1GooFit job AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 16/24 10/20 Performancecomparison: RooFit/PROOF-Lite vsGooFit/MPS - II Asecond performances’comparisoncanbecarriedoutonboththeservershostingbothtypeof GPUs(TK20&TK40) asafunctionofthe#ofpseudo-experimentsproduced. Herewelimitthecomparisonto16independentprocesses(duetoMPSlimitforthesingleTK40) Speedups(log-scale) Wecancompare:- 1PROOF-Litejobusing16workers(on16CPUcores) with:- 1GooFit/MPSjobrunning16simultaneousprocessesonsingleTK40/TK20 Sn=16=N TK 40 = ~60 RooFit Tn=16 GooFit TN=16 ~40 Sn=16=N TK 20 = TK 40 RooFit Tn=16 GooFit N=16 TK 20 T < Sn=30=N 2−TK 20 ~ 45 Effectof PROOF-Lite saturation #ofprocessedMCtoys(perapplication) ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 11/20 17/24 Performancecomparison: RooFit/PROOF-Lite vsGooFit/MPS - II Asecond performances’comparisoncanbecarriedoutonboththeservershostingbothtypeof GPUs(TK20&TK40) asafunctionofthe#ofpseudo-experimentsproduced. Herewelimitthecomparisonto16independentprocesses(duetoMPSlimitforthesingleTK40) Speedups(log-scale) Wecancompare:- 1PROOF-Litejobusing16workers(on16CPUcores) with:- 1GooFit/MPSjobrunning16simultaneousprocessesonsingleTK40/TK20 Sn=16=N TK 40 = ~60 RooFit Tn=16 GooFit TN=16 ~40 Sn=16=N TK 20 = RooFit Tn=16 S #ofprocessedMCtoys(perapplication) ACAT-2016 /January18 CCRMeeting/ July5th th = GooFit N=16 TK 20 T ~1.5 GPU N=16 TK 40 TN=16 TK 20 TN=16 TK 40 < Sn=30=N 2−TK 20 ~ 45 Effectof PROOF-Lite saturation Gainwithinmicro-architecture:TK40vs TK20 AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 11/20 17/24 Performancecomparison: RooFit/PROOF-Lite vsGooFit/MPS - III Athird performances’comparisoncanbedonefromthepointofviewoftheend-user/analystand thetimeneededtodeliverthepseudo-experiments’task. Letusassumehehasathisowndisposalthefullcomputationalpowerusedinthesestudies: 2serversequippedwith3GPUs(2TK20&1TK40)and72CPUcores(36physicalcores+HyperThr). Elapsedtime[s](log-scale) 1month 1week 2 days x 1M Toys 1day 10hours 2 hours 1hour 10min #oftotalprocessedMCtoys ACAT-2016 /January18 CCRMeeting/ July5th th ~ 11days ~ 6hours TOTAL SPEED UP 𝑆 ≈ 41.0 To get a signal significance >5σ, a p-value < 3x10-7 is needed, namely at least 3.3M toys are needed. To estimate a signal signif. much more toys are needed (see next slide) AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 12/20 18/24 P-Value&statisticalsignificanceestimation Thefinalobtaineddistribution Δχ 2 (MCtoysproduction wasstopped once 2 afluctuationwithwasfound) Δχ 2 > Δχ DATA 2 Δχ DATA ≅ 53.0 ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 13/20 19/24 P-Value&statisticalsignificanceestimation Thefinalobtaineddistribution Δχ 2 (MCtoysproduction wasstopped once 2 afluctuationwithwasfound) Δχ 2 > Δχ DATA Δχ 2 ≅ 56.9 2 Δχ DATA ≅ 53.0 ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 13/20 19/24 P-Value&statisticalsignificanceestimation Thefinalobtaineddistribution Δχ 2 (MCtoysproduction wasstopped once 2 afluctuationwithwasfound) Δχ 2 > Δχ DATA Δχ 2 ≅ 56.9 2 Δχ DATA ≅ 53.0 Thep-valueestimationisstraightforward: +∞ p − value : P = ∫ 2 Δχ DATA Δχ 2 ≈ 1 −8 ≅1.73⋅10 57.7⋅10 6 Equivalent(gaussian)statisticalsignificance: Zσ = Φ−1 (1− P)σ ≅ 5.52σ Compatible withthelowerlimitof5σ forthestatisticalsignificancequotedinthe CMSpaperPLB734(2014) 261 onthebasis of50.5millions ofMCtoys(byRooFit). ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) Inversefunction ofthe cumulativedistribution ofthestandardgaussian 19/24 ExploringtheapplicabilitylimitsofWilks theorem ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 1/20 Wilks theorem&theneedofMCtoys- I [*] S.S.Wilks, Ann.Math.Stat. 9 (1938) 60-62 TheWilks[*] theoremisoftenusedtoestimatethep-valueassociatedtoanew/unexpectedsignal: Giventwohypotheses: Nullhypotheseswithd.o.f. H0 ν0 Alternativehypotheseswithd.o.f. H1 ν1 "L % …anyteststatistict ,definedasalikelihoodratio −2ln λ = −2ln $ H 0 ' $L ' # H1 & [orsimilarly(intheasymptoticlimit)asa], Δχ 2 = χ H2 − χ H2 0 1 approaches adistributionwithd.o.f.,providedthattheseregularityconditionshold: χ2 ν = ν1 − ν 0 H0 H1 H 0 andarenested(“includes”) H1 whiletheparametersarewellbehaving(definedandnotapproachingsomelimit) H1 H1 → H 0 asymptoticlimit(ofalargedatasample) ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 14/20 20/24 Wilks theorem&theneedofMCtoys- I [*] S.S.Wilks, Ann.Math.Stat. 9 (1938) 60-62 TheWilks[*] theoremisoftenusedtoestimatethep-valueassociatedtoanew/unexpectedsignal: Giventwohypotheses: Nullhypotheseswithd.o.f. H0 ν0 Alternativehypotheseswithd.o.f. H1 ν1 "L % …anyteststatistict ,definedasalikelihoodratio −2ln λ = −2ln $ H 0 ' $L ' # H1 & [orsimilarly(intheasymptoticlimit)asa], Δχ 2 = χ H2 − χ H2 0 1 approaches adistributionwithd.o.f.,providedthattheseregularityconditionshold: χ2 ν = ν1 − ν 0 H0 H1 H 0 andarenested(“includes”) H1 whiletheparametersarewellbehaving(definedandnotapproachingsomelimit) H1 H1 → H 0 asymptoticlimit(ofalargedatasample) Oncethistheoremholds,thep-valueassociatedtothesignalisgivenby: P = Theuseofpseudo-experimentstoestimatethep-valueisnotneeded (butstillsuggested) ∞ ∫χ tobs 2 ν1−ν 0 (t)dt Whennull hypothesisisbackground-onlyandthealternativeisbackground+signal, oftentheaboveregularityconditionsarenotallsatisfied,andMCtoysaremandatory! ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 14/20 20/24 Wilks theorem&theneedofMCtoys- II Indeedthisisthecasewearedealingwith,here! Thesignalparametersinthemodelofhypothesisaremass(),width()andyield(). Γ m H1 µ≥0 Whentheproblemisthat:1)andarenotwelldefined,2)tendtothenulllimit. m Γ H1 → H 0 µ Thisexplainswhywehaveusedpseudo-experiments. Thedistributionsofteststatisticareingeneralnonpredictable and canbeextractedfromMCtoys! ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 21/24 Wilks theorem&theneedofMCtoys- II Indeedthisisthecasewearedealingwith,here! Thesignalparametersinthemodelofhypothesisaremass(),width()andyield(). Γ m H1 µ≥0 Whentheproblemisthat:1)andarenotwelldefined,2)tendtothenulllimit. m Γ H1 → H 0 µ Thisexplainswhywehaveusedpseudo-experiments. Thedistributionsofteststatisticareingeneralnonpredictable and canbeextractedfromMCtoys! Thepossibledistributionsinthedifferentcases areshown&twospecialcaseswillbediscussed m,Γ fixed;µ free Δχ 2 m,Γ fixed;µ >0 m,Γ free;µ free m,Γ free;µ >0 Δχ 2 Δχ 2 m free,Γ fixed;µ free m free,Γ fixed;µ >0 ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 15/20 21/24 SpecialcaseinwhichWilks theoremholds Considertheteststatistic[: tµ = −2 ln λ (µ ) µ strengthparameter ]asthebasisofthestatisticaltest. Thiscouldbeatestofforpurposesofestablishingtheexistenceofasignalprocess,or µ=0 …offorpurposesofobtainingaconfidenceinterval. µ≠0 Inthelattercase,followingCowanetal.[*]thePDFofthetest statisticapproachesachi-squaredistributionfor1d.o.f.: [ inagreementwithWilks theorem!] f (tµ µ ) = 1 1 −tµ 2 e 2 π tµ [*] Cowan et al., EPJ C71 (2011) 1554 ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 16/20 22/24 SpecialcaseinwhichWilks theoremholds Considertheteststatistic[: tµ = −2 ln λ (µ ) µ strengthparameter ]asthebasisofthestatisticaltest. Thiscouldbeatestofforpurposesofestablishingtheexistenceofasignalprocess,or µ=0 …offorpurposesofobtainingaconfidenceinterval. µ≠0 Inthelattercase,followingCowanetal.[*]thePDFofthetest statisticapproachesachi-squaredistributionfor1d.o.f.: [ inagreementwithWilks theorem!] Letusfixthe¶meters, m Γ (tothe CMSestimatesfromthe fitto data) whileleavingfreeinourMLfits µ (isnotproperlyasignalyield). µ f (tµ µ ) = 1 1 −tµ 2 e 2 π tµ Likelihoodratiodistribution Byfitting our likelihoodratio distrib.weindeedget: Fitpull d.o.f. ≈ 1.014 ± 0.001 ( 𝜒0123 = 1.009 𝑃 𝑓𝑖𝑡 = 0.118 [*] Cowan et al., EPJ C71 (2011) 1554 ACAT-2016 /January18 CCRMeeting/ July5th th −2 ln λ AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 16/20 22/24 Specialcase: asymptoticformulabyCowanetal.[*] holds Considerthespecialcaseoftheteststatisticwiththepurposetotestinaclassofmodel tµ µ=0 whereweassume.Rejecting(thenullhypothesis)leadstothediscoveryofanewsignal. µ≥0 µ=0 "$ µ̂ ≥ 0 "$ −2 ln λ (0) InthiscasefollowingCowanetal.theteststatisticis: q0 = # with # $% 0 $% µ̂ < 0 q0 Cowanetal.deriveanalitically thatthePDFof isanequalmixtureofadeltafunctionat0& achi-squaredistributionfor1d.o.f.: 1 1 " 1 1 −q0 g(q0 µ = 0) = δ (q0 )+ $ e 2 2 $# 2π q0 2% ' '& [*] Cowan et al., EPJ C71 (2011) 1554 ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 17/20 23/24 Specialcase: asymptoticformulabyCowanetal.[*] holds Considerthespecialcaseoftheteststatisticwiththepurposetotestinaclassofmodel tµ µ=0 whereweassume.Rejecting(thenullhypothesis)leadstothediscoveryofanewsignal. µ≥0 µ=0 "$ µ̂ ≥ 0 "$ −2 ln λ (0) InthiscasefollowingCowanetal.theteststatisticis: q0 = # with # $% 0 $% µ̂ < 0 q0 Cowanetal.deriveanalitically thatthePDFof isanequalmixtureofadeltafunctionat0& achi-squaredistributionfor1d.o.f.: Letusfixthe¶meters m Γ (tothe CMSestimatesfromfittodata)while constraininginourMLfits µ≥0 (representsasignalyieldhere). µ Byfitting our likelihoodratio distrib.weindeedget: 1 1 " 1 1 −q0 g(q0 µ = 0) = δ (q0 )+ $ e 2 2 $# 2π q0 2% ' '& Likelihoodratiodistribution Fitpull d.o.f. ≈ 0.992 ± 0.001 weight Cχ 2 ≈ 0.507± 0.01 [*] Cowan et al., EPJ C71 (2011) 1554 ACAT-2016 /January18 CCRMeeting/ July5th th −2 ln λ AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 17/20 23/24 Specialcase: asymptoticformulabyCowanetal.[*] holds Considerthespecialcaseoftheteststatisticwiththepurposetotestinaclassofmodel tµ µ=0 whereweassume.Rejecting(thenullhypothesis)leadstothediscoveryofanewsignal. µ≥0 µ=0 "$ µ̂ ≥ 0 "$ −2 ln λ (0) InthiscasefollowingCowanetal.theteststatisticis: q0 = # with # $% 0 $% µ̂ < 0 q0 Cowanetal.deriveanalitically thatthePDFof isanequalmixtureofadeltafunctionat0& achi-squaredistributionfor1d.o.f.: Letusfixthe¶meters m Γ (tothe CMSestimatesfromfittodata)while constraininginourMLfits µ≥0 (representsasignalyieldhere). µ Byfitting our likelihoodratio distrib.weindeedget: 1 1 " 1 1 −q0 g(q0 µ = 0) = δ (q0 )+ $ e 2 2 $# 2π q0 2% ' '& Likelihoodratiodistribution Fitpull d.o.f. ≈ 0.992 ± 0.001 weight Cχ 2 ≈ 0.507± 0.01 [*] Cowan et al., EPJ C71 (2011) 1554 ACAT-2016 /January18 CCRMeeting/ July5th th −2 ln λ AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 17/20 23/24 Specialcase: asymptoticformulabyCowanetal.[*] holds Byfitting our likelihoodratiodistributionwithadelta+chisquarep.d.f. weindeedget: ( 𝜒0123 = 1.013 𝑃 𝑓𝑖𝑡 = 0.035 ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 17/20 24/24 Summary&Outlook ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 1/20 Summary Inorder totestthecomputingcapabilitiesofGPUswithrespecttotraditionalCPUcores, ahigh-statisticstoyMonteCarlotechniquehasbeenimplemented both inROOT/RooFit and GooFit frameworkswiththepurpose toestimatethelocalstatisticalsignificanceof a- possibly exoticcharmonium-like - signalrecentlyconfirmed byCMS(itwasfirstly observed byCDF). TheoptimizedGooFit applicationsrunning,bymeansoftheMPS,onGPUs,hostedbythe serversusedinthepresentedtest,providesastrikingspeed-upperformancewith respecttotheRooFit applicationparallelizedonmultipleCPUsbymeansofPROOF-Lite. ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 18/20 Summary Inorder totestthecomputingcapabilitiesofGPUswithrespecttotraditionalCPUcores, ahigh-statisticstoyMonteCarlotechniquehasbeenimplemented both inROOT/RooFit and GooFit frameworkswiththepurpose toestimatethelocalstatisticalsignificanceof a- possibly exoticcharmonium-like - signalrecentlyconfirmed byCMS(itwasfirstly observed byCDF). TheoptimizedGooFit applicationsrunning,bymeansoftheMPS,onGPUs,hostedbythe serversusedinthepresentedtest,providesastrikingspeed-upperformancewith respecttotheRooFit applicationparallelizedonmultipleCPUsbymeansofPROOF-Lite. Bymeansof GooFit ithasalsobeeneasiertoexplorethe(asymptotic)behaviour ofalikelihood ratioteststatistic indifferentsituationsinwhichtheWilks Theorem mayapplyordoesnotapply becauseitsregularityconditionsarenotsatisfied. ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 18/20 Outlook Thepresentedmethodcanbeextended tosituationswithanew unexpectedsignal, whereaglobal statisticalsignificancemustbeestimated. ToincludeproperlytheLook-Elsewhere-Effectasortofscanningtechniqueofthe relevantmassspectraneedstobeimplemented. ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 19/20 Outlook Thepresentedmethodcanbeextended tosituationswithanew unexpectedsignal, whereaglobal statisticalsignificancemustbeestimated. ToincludeproperlytheLook-Elsewhere-Effectasortofscanningtechniqueofthe relevantmassspectraneedstobeimplemented. Thiscancertainlyeither… - increasetheexecutiontimeofthefitstobeperformed onthesingle fluctuation, and… - requiretotrydifferentscanmodels (andrepeatthewholeprocedure) inorderto evaluatetheassociatedsystematicuncertainty. Inthissituation: - theRooFit approachwouldbeunbearable(highlytime-consuming!), - turningtoGPUswouldbemandatory , - GooFit wouldbethereliable&crucialtool. ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 19/20 Ifyouareinterestedtostartlearning&workingwithGooFit … 1)youcantakethetutorialbyR.Andreassen : http://indico.cern.ch/conferenceDisplay.py?confId=235992 2)GooFit sourcecodelivesinaGitHub repository:https://github.com/GooFit 3)youmaywanttoexchangeusefulfeedbacksontheGooFit GoogleGroup. Thank you for your attention Letmethankinparticular: mysupervisorofCMS-Bari:: AlexisPompili (UniversityofBari&INFN) LeonardoCristella (UniversityofBari&INFN)&Giacinto Donvito (INFN-Bari,Tier2 manager) &thesupportbyItalianProject20108T4XTM- MIURPRIN2010-2011 - STOALHC - MikeSokoloff (UniversityofCincinnati)coordinatoroftheGooFit projectfundedbyNSF (NSF-1414736 EnablingHEPattheInformationFrontierUsingGPUsandOtherMany/Multi-Core Architectures) - BradHittle (OhioSupercomputerCenter)&Tommaso Dorigo (INFN-Padova) ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 20/20 BACKUP ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 20/20 Amdhal’s Law Incomputer architecture,Amdahl's lawgives thetheoretical speedup when using multiple processorsas afunction ofthefraction (P)of thecodethat canbeparellilised andofthe number ofmultiprocessors (n)used. ACAT-2016 /January18 CCRMeeting/ July5th th AlexisPompili Adriano DiFlorio (BariUniversity& (BariUniversity&INFN) INFN) 20/20