poster pdf
Transcription
poster pdf
The GPIES Data Cruncher: An Automated Data Processing System for the Gemini Planet Imager Exoplanet Survey Jason J. Wang, Pauline Arriaga, Marshall D. Perrin, Dmitry Savransky, James R. Graham, Christian Marois, Julien Rameau, Jean-Baptise Ruffio, and the GPI Team jwang@astro.berkeley.edu Summary: • The Data Cruncher can automatically process all science and calibration data from the GPI Exoplanet Survey and more • Sensitivity curves and multiple PSF subtraction products are produced one hour after the data are available • The Super Data Cruncher can also run on a supercomputing cluster and reprocess the entire campaign in a few hours Crunchable Data c Super Data Cruncher GPI Exoplanet Survey Science • 1 hour H-band integral field spectroscopy planet search • 10 minute H-band snapshot broadband imaging polarimetry • 1 hour H-band deep broadband imaging polarimetry GPIES Follow-up • Multi-epoch deep follow-up observations in multiple bands GPI Queue Programs • All coronagraphic data taken for GPIES members’ queue programs Calibrations • All calibration data taken by GPI (which are publically available) • • • • Runs on NERSC’s Edison supercomputer (5576 nodes, 133,824 cores, 357 TB RAM) Uses MPI for inter-node communication < 100 lines of code needed to implement the Super Data Cruncher Reprocesses the entire campaign in a few hours Runtime (Hours) 3 Data Flow MySQL DB Summit Dropbox Taken Stored & Synced Weak Scaling Logged Quality Checked 2.5 2 1.5 1 0.5 0 0 5 10 15 20 25 30 # of Datsets and Nodes Reduced Data Products • All data products produced within ~1 hour of the data being available • All data are synced to Dropbox for accessibility Architecture • Written in Python with some pipeline components written in IDL • Highly modularized, multithreaded, and asynchronous Update Wiki Datacubes Realtime Scanner Reprocessor Queues new datasets for processing and updates the GPIES Wiki Queries database to find and process existing datasets on demand New Files Spectral Cube Web Socket or MPI (Perrin et al. 2014) Calibrations pyKLIP Processing Controller GPI DRP pyKLIP: ADI+SDI pyKLIP: ADI+SDI w/ methane Network Interface (Wang et al. 2015) Save Reduced Data Products Contrast Curves Query for data Send Commands Processing Backend PSF Subtracted Images High-level Python logic that controls dataflow through the various pipelines cADI (UdeM pipeline) Uses queues to communicate between threads and monitors for TLOCI synchronization (Marois et al. 2014) References: Marois, C., Correia, C., Galicher, R., et al. 2014, Proc SPIE, 9148. Perrin, M. D., Maire, J. , Ingraham, P., et al. 2014, Proc SPIE, 9147. Wang, J. J., Ruffio, J.-B., De Rosa, R. J., et al. 2015, Astrophysics Source Code Library, record ascl:1506.001. Check for bad files Polarimetry Cube pyKLIP: ADI cADI Acknowledgements: This research was supported in part by NASA NNX15AD95G, NASA NNX11AD21G, NSF AST-0909188, and the University of California LFRP-118057. The GPI project has been supported by Gemini Observatory, which is operated by AURA, Inc., under a cooperative agreement with the NSF on behalf of the Gemini partnership: the NSF (USA), the National Research Council (Canada), CONICYT (Chile), the Australian Research Council (Australia), MCTI (Brazil) and MINCYT (Argentina).