Presentation Slides
Transcription
Presentation Slides
Research Information & Data Management (RIDM) Introductions: Ellie Ransom: Research Services Coordinator, @CU_SEL, ellie. ransom@columbia.edu Amy Nurnberger: Research Data Manager, @DataAtCU, curdm@columbia.edu The Plan: & Introductions: The Plan for Research & Information Data (RID): ➔ Identify it ➔ Manage it ➔ Document it ➔ Secure it ➔ Deal with it Amy Nurnberger: Research Data Manager, @DataAtCU Ellie Ransom: Research Services Coordinator, @CU_SEL Identify it: Anthony Elia at http://hdl.handle.net/10022/AC:P:19828 | Oscilloscope by Voltcraft by Hannes Grobe, https://commons.wikimedia.org/wiki/File: Oscilloscope-voltcraft_hg.jpg, cc-by-3.0 | Identify it: Anthony Elia at http://hdl.handle.net/10022/AC:P:19828 | Oscilloscope by Voltcraft by Hannes Grobe, https://commons.wikimedia.org/wiki/File: Oscilloscope-voltcraft_hg.jpg, cc-by-3.0 | Identify it: Anthony Elia at http://hdl.handle.net/10022/AC:P:19828 | Oscilloscope by Voltcraft by Hannes Grobe, https://commons.wikimedia.org/wiki/File: Oscilloscope-voltcraft_hg.jpg, cc-by-3.0 | Identify it: Anthony Elia at http://hdl.handle.net/10022/AC:P:19828 | Oscilloscope by Voltcraft by Hannes Grobe, https://commons.wikimedia.org/wiki/File: Oscilloscope-voltcraft_hg.jpg, cc-by-3.0 | Identify it: Material or information "on which an argument, theory, test or hypothesis, or another research output is based." Anthony Elia at http://hdl.handle.net/10022/AC:P:19828 | Oscilloscope by Voltcraft by Hannes Grobe, https: //commons.wikimedia.org/wiki/File:Oscilloscope-voltcraft_hg.jpg, cc-by-3.0 | Queensland University of Technology. Manual of Procedures and Policies. Section 2.8.3. http://www.mopp.qut.edu.au/D/D_02_08.jsp Identify it - What is it? ➢ Non-digital text (lab books, field notebooks, archival texts) ➢ Matlab files & Models ➢ Metadata & Paradata ➢ Data visualizations ➢ Computer code ➢ Standard operating procedures and protocols ➢ Digital texts or digital copies of text ➢ Spreadsheets ➢ Audio, video ➢ Computer Aided Design/CAD ➢ Protein or genetic sequences ➢ Statistics (SPSS, SAS) ➢ Artistic products ➢ Databases ➢ Web files ➢ Geographic Information Systems (GIS) and spatial data ➢ Curriculum materials ➢ Digital copies of images ➢ ➢ Non-digital images Collection of digital objects acquired and generated during research Adapted from: Georgia Tech–http://libguides.gatech.edu/content.php?pid=123776&sid=3067221 Identify it: Who has it? Identify it: Who has it? Identify it: Who has it? Identify it: Who has it? Identify it: has it! So, what are you going to do with it? Manage it! What is Research Information & Data Management (RIDM)? – Rex Sanders What is Research Information & Data Management (RIDM)? exists found understand trust can use – Rex Sanders http://www.slideshare.net/shlake/documentation-metadatadentonlake | http://dx.doi.org/10.1890/1051-0761(1997)007%5B0330:NMFTES%5D2.0. CO;2 YOU http://openarchaeologydata.metajnl.com/about/ , modified Manage it when? Manage it when? Plan to manage: 1. What information/data are you producing? 2. How are you documenting / describing it? 3. Where are you storing it? 4. When are you sharing it? 5. Who’s responsible? What are you producing? Manage it: Volume Manage it: Volume Manage it: Volume Manage it: Velocity Manage it: Velocity Manage it: Velocity Manage it: Variety / Interoperability Manage it: Sensitive data Manage it: Sensitive data Ownership IRB PII Classified HIPPA Restricted FERPA Intellectual property, e.g. patent or copyright How are you documenting it? Document it: Take good notes! 00100100 01101010 10000101 00001000 00010011 10001010 00000011 01110011 10100100 00111000 00101001 00110001 00001000 11111010 11101100 01101100 00111111 10001000 10100011 11010011 00011001 00101110 01110000 01000100 00001001 00100010 10011111 11010000 00101110 10011000 01001110 10001001 ??? 00100100 01101010 10000101 00001000 00010011 10001010 00000011 01110011 10100100 00111000 00101001 00110001 00001000 11111010 11101100 01101100 00111111 10001000 10100011 11010011 00011001 00101110 01110000 01000100 00001001 00100010 10011111 11010000 00101110 10011000 01001110 10001001 Methods • What was done • How it was done • Instrumentation/Equipment (RASCAL course) • Limitations Code • All of the meanings Description / Documentation Labels (w/ units!) • Codebook • Data dictionary • Laboratory notebook 00100100 01101010 10000101 00001000 00010011 10001010 00000011 01110011 10100100 00111000 00101001 00110001 00001000 11111010 11101100 01101100 00111111 10001000 10100011 11010011 00011001 00101110 01110000 01000100 00001001 00100010 10011111 11010000 00101110 10011000 01001110 10001001 There are standards for documentation: http://www.dcc.ac. uk/resources/metad ata-standards C d π Methods • What was done • How it was done • Instrumentation/Equipment (RASCAL course) • Limitations Code • All of the meanings Description / Documentation Labels (w/ units!) • Codebook • Data dictionary • Laboratory notebook Document it: Speaking of standards… Document it: Standards of scholarship & academia: Plagiarism ? Document it: Standards of scholarship & academia: Plagiarism Cite stuff! dit e r C t e G it d e Cr • Data citation e v i G Table of citation elements 1 2 A Authors & Contributors 4 Pd Publication date 3 T Ei Electronic ID, e.g., DOI Title 5 P Publisher / Distributor - Track reuse - Measure impact - Support reproducibility https://www.force11.org/group/jointdeclaration-data-citation-principles-final CU-RDM@columbia.edu Document it: Citation managers http://library.columbia.edu/research/citation-management. html Document it: Citation managers http://library.columbia.edu/research/citation-management. html Who owns it: Intellectual Property & Ownership: ????? You Your PI Columbia University Publisher Funding Agency ????? How do you store it? Store it: Secure it: Security Storage Secure it: How will you protect your or your participant’s: ● Security ● Privacy/ confidentiality ● Intellectual property ● Other rights ? Secure it: Secure it: Backups Where ● Here ● Near ● Far When ● Regularly & frequently ● Schedule it Test it ● File recovery ● Checksums Secure it: Backups What about , you ask? Read the fine print (what happens to your stuff when the service inevitably dies?) Consider: ● Security ● Cost ● Accessibility ● Longevity Secure it: Security Who needs to/should see the data when? IRB PII FERPA HIPPA Copyright Restricted Patent potential Classified Licenses & IP Consider: ● Restricting physical access ● Encryption ● De-identification ● Strong passwords (password manager) Storing it: Some practicalities ● File formats ● File naming and organization ● Version control Storing it: File formats (for interoperability & storage) • • • • Non-proprietary Open, documented standard Standard representation (e.g., ASCII, Unicode) Common, or commonly used by the research community (e.g. FITS, CIF) • Unencrypted Not sure about the extension? • Uncompressed ✓ Some commonly recognized formats meeting these criteria: ASCII [e.g., .csv, .txt], PDF [.pdf], FLAC, TIFF, JPEG2000 [.jp2], MPEG-4 [.mp4], XML [.xml, .odf, .rdf], R [.r] Check https://www. nationalarchives.gov. uk/PRONOM/defaul t.htm http://www.data-archive.ac.uk/media/2894/managingsharing.pdf | http://www.digitalpreservation.gov/formats/index.shtml? PHPSESSID=c26c5e5101396d5f5ebacedb13cae6e3 Storing it: File naming Storing it: File naming ● Consistency: Pick a system, write it down, stick with it ● Identify necessary elements & consider their order ● Create brief, understandable names ● Date: YYYY-MM-DD or YYYYMMDD ● Version: v01, v02,…FINAL ● Try to stay away from spaces in filenames as well as the following characters: \ / : * ? “. < > | [ ] & $ (reserve . for file extensions) ● Recognize: At the file level, Firefly/browncoat/shiny.txt = Firefly/alliance/shiny.txt Make a system. Share the system. Follow the system Storing it: File organization ● Consider organizing by logical chunks, e.g. project, class, grant ● What makes sense for the work you’re doing? How are you likely to look for related items? ● Identify important elements & how they should be nested ● Don’t make the system too deep ● Choose brief, understandable names ● Document it! Make a system. Share the system. Follow the system Storing it: Versioning: Did you change the file? Change the name! Indicate versions ● filename_v001 Indicate responsibility ● Initials: file_v05_gh ● report_draft_r045 ● ID designation: file_v05_iam37 ● report_final_r176 ● presentation_20140706 Make a system. Share the system. Follow the system Storing it: Columbia resources ● ● ● ● Lionmail drive Academic Commons The Libraries Departmental IT Sharing it: But my PI told me to do it this way? Sharing it: File naming & organization Collaborating on a complex project? Make sure to share and agree on your naming, organizational, and versioning systems! Make a system. Share the system. Follow the system Research Information & Data (RID) sharing: ● What: Unique, reusable, relevant data : With whom: Your future self! Your ● collaborators. Your research community. The world. (mind restrictions, etc.) ● When: During the project with collaborators. At pre-determined project stages. At project completion. ● How: Data Publication ● Frequently required by funding organizations RID sharing – What? Not all data should be archived or be kept for the same time, or in the same way. Appraise your data on the following principles: ● Relevance to research mission ● Historical or scientific value ● Uniqueness ● Reliability / Integrity / Usability of data ● Replicability, or lack thereof ● Cost of management and preservation ● Adequate available documentation ● Satisfaction of requirements RID sharing – With whom YOU http://openarchaeologydata.metajnl.com/about/ , modified RID sharing – When (depends on whom) e h t f all o it me! RID sharing – How, Data Publishing ● Data publication in repositories ○ Institutional: http://academiccommons.columbia. edu/ ○ Disciplinary, Directory: http://www.re3data.org/ ○ Requirements ■ long-term storage and access to data ■ validation of data integrity [check-sum] ■ a permanent resource locator (e.g., DOI, Purl, hdl) to make its data persistent, unique, and citable ● Data descriptors ● Data papers ● Supplementary material Using it: Columbia resources ● ● ● ● Open Source Software Licensed Software Specialized Software High Performance Computing Responsibility Questions? Contact us: Ellie | Research Services Coordinator | ellie.ransom@columbia.edu Amy | Research Data Manager | cu-rdm@columbia.edu