poster pdf

Transcription

poster pdf
The GPIES Data Cruncher:
An Automated Data Processing System for the Gemini Planet
Imager Exoplanet Survey
Jason J. Wang, Pauline Arriaga, Marshall D. Perrin, Dmitry Savransky, James R. Graham,
Christian Marois, Julien Rameau, Jean-Baptise Ruffio, and the GPI Team
jwang@astro.berkeley.edu
Summary:
• The Data Cruncher can automatically process all science and calibration data from the GPI Exoplanet Survey and more
• Sensitivity curves and multiple PSF subtraction products are produced one hour after the data are available
• The Super Data Cruncher can also run on a supercomputing cluster and reprocess the entire campaign in a few hours
Crunchable Data
c
Super Data Cruncher
GPI Exoplanet Survey Science
• 1 hour H-band integral field spectroscopy planet search
• 10 minute H-band snapshot broadband imaging polarimetry
• 1 hour H-band deep broadband imaging polarimetry
GPIES Follow-up
• Multi-epoch deep follow-up observations in multiple bands
GPI Queue Programs
• All coronagraphic data taken for GPIES members’ queue programs
Calibrations
• All calibration data taken by GPI (which are publically available)
•
•
•
•
Runs on NERSC’s Edison supercomputer (5576 nodes, 133,824 cores, 357 TB RAM)
Uses MPI for inter-node communication
< 100 lines of code needed to implement the Super Data Cruncher
Reprocesses the entire campaign in a few hours
Runtime (Hours)
3
Data Flow
MySQL DB
Summit
Dropbox
Taken
Stored & Synced
Weak Scaling
Logged
Quality Checked
2.5
2
1.5
1
0.5
0
0
5
10
15
20
25
30
# of Datsets and Nodes
Reduced Data Products
• All data products produced within ~1 hour of the data
being available
• All data are synced to Dropbox for accessibility
Architecture
• Written in Python with some pipeline components written in IDL
• Highly modularized, multithreaded, and asynchronous
Update
Wiki
Datacubes
Realtime Scanner
Reprocessor
Queues new datasets for
processing and updates
the GPIES Wiki
Queries database to find
and process existing
datasets on demand
New
Files
Spectral Cube
Web Socket or MPI
(Perrin et al. 2014)
Calibrations
pyKLIP
Processing Controller
GPI DRP
pyKLIP: ADI+SDI
pyKLIP: ADI+SDI
w/ methane
Network Interface
(Wang et al. 2015)
Save
Reduced
Data
Products
Contrast Curves
Query for data
Send
Commands
Processing Backend
PSF Subtracted Images
High-level Python logic that controls
dataflow through the various pipelines
cADI
(UdeM pipeline)
Uses queues to communicate between
threads and monitors for
TLOCI
synchronization
(Marois et al. 2014)
References:
Marois, C., Correia, C., Galicher, R., et al. 2014, Proc SPIE, 9148.
Perrin, M. D., Maire, J. , Ingraham, P., et al. 2014, Proc SPIE, 9147.
Wang, J. J., Ruffio, J.-B., De Rosa, R. J., et al. 2015, Astrophysics Source Code Library, record ascl:1506.001.
Check for
bad files
Polarimetry Cube
pyKLIP: ADI
cADI
Acknowledgements: This research was supported in part by NASA NNX15AD95G, NASA NNX11AD21G, NSF AST-0909188, and the University
of California LFRP-118057. The GPI project has been supported by Gemini Observatory, which is operated by AURA, Inc., under a cooperative
agreement with the NSF on behalf of the Gemini partnership: the NSF (USA), the National Research Council (Canada), CONICYT (Chile), the
Australian Research Council (Australia), MCTI (Brazil) and MINCYT (Argentina).