Presentation Slides

Transcription

Presentation Slides
Research Information &
Data Management (RIDM)
Introductions:
Ellie Ransom: Research Services
Coordinator, @CU_SEL, ellie.
ransom@columbia.edu
Amy Nurnberger: Research Data
Manager, @DataAtCU, curdm@columbia.edu
The Plan:
& Introductions:
The Plan for Research & Information Data (RID):
➔ Identify it
➔ Manage it
➔ Document it
➔ Secure it
➔ Deal with it
Amy Nurnberger: Research Data Manager, @DataAtCU
Ellie Ransom: Research Services Coordinator, @CU_SEL
Identify it:
Anthony Elia at http://hdl.handle.net/10022/AC:P:19828 | Oscilloscope by Voltcraft by Hannes Grobe, https://commons.wikimedia.org/wiki/File:
Oscilloscope-voltcraft_hg.jpg, cc-by-3.0 |
Identify it:
Anthony Elia at http://hdl.handle.net/10022/AC:P:19828 | Oscilloscope by Voltcraft by Hannes Grobe, https://commons.wikimedia.org/wiki/File:
Oscilloscope-voltcraft_hg.jpg, cc-by-3.0 |
Identify it:
Anthony Elia at http://hdl.handle.net/10022/AC:P:19828 | Oscilloscope by Voltcraft by Hannes Grobe, https://commons.wikimedia.org/wiki/File:
Oscilloscope-voltcraft_hg.jpg, cc-by-3.0 |
Identify it:
Anthony Elia at http://hdl.handle.net/10022/AC:P:19828 | Oscilloscope by Voltcraft by Hannes Grobe, https://commons.wikimedia.org/wiki/File:
Oscilloscope-voltcraft_hg.jpg, cc-by-3.0 |
Identify it:
Material or information "on
which an argument, theory,
test or hypothesis, or
another research output is
based."
Anthony Elia at http://hdl.handle.net/10022/AC:P:19828 | Oscilloscope by Voltcraft by Hannes Grobe, https:
//commons.wikimedia.org/wiki/File:Oscilloscope-voltcraft_hg.jpg, cc-by-3.0 | Queensland University of Technology.
Manual of Procedures and Policies. Section 2.8.3. http://www.mopp.qut.edu.au/D/D_02_08.jsp
Identify it - What is it?
➢
Non-digital text (lab books,
field notebooks, archival texts)
➢
Matlab files & Models
➢
Metadata & Paradata
➢
Data visualizations
➢
Computer code
➢
Standard operating procedures
and protocols
➢
Digital texts or digital copies of
text
➢
Spreadsheets
➢
Audio, video
➢
Computer Aided Design/CAD
➢
Protein or genetic sequences
➢
Statistics (SPSS, SAS)
➢
Artistic products
➢
Databases
➢
Web files
➢
Geographic Information
Systems (GIS) and spatial data
➢
Curriculum materials
➢
Digital copies of images
➢
➢
Non-digital images
Collection of digital objects
acquired and generated during
research
Adapted from: Georgia Tech–http://libguides.gatech.edu/content.php?pid=123776&sid=3067221
Identify it:
Who has it?
Identify it:
Who has it?
Identify it:
Who has it?
Identify it:
Who has it?
Identify it:
has it!
So, what are
you going to
do with it?
Manage it!
What is Research Information & Data
Management (RIDM)?
– Rex Sanders
What is Research Information & Data
Management (RIDM)?
exists
found
understand
trust
can use
– Rex Sanders
http://www.slideshare.net/shlake/documentation-metadatadentonlake | http://dx.doi.org/10.1890/1051-0761(1997)007%5B0330:NMFTES%5D2.0.
CO;2
YOU
http://openarchaeologydata.metajnl.com/about/ , modified
Manage it
when?
Manage it
when?
Plan to manage:
1. What information/data are you producing?
2. How are you documenting / describing it?
3. Where are you storing it?
4. When are you sharing it?
5. Who’s responsible?
What are you
producing?
Manage it:
Volume
Manage it:
Volume
Manage it:
Volume
Manage it:
Velocity
Manage it:
Velocity
Manage it:
Velocity
Manage it:
Variety / Interoperability
Manage it:
Sensitive data
Manage it:
Sensitive data
Ownership
IRB
PII
Classified
HIPPA
Restricted
FERPA
Intellectual
property,
e.g. patent or
copyright
How are you
documenting
it?
Document it:
Take good notes!
00100100
01101010
10000101
00001000
00010011
10001010
00000011
01110011
10100100
00111000
00101001
00110001
00001000
11111010
11101100
01101100
00111111
10001000
10100011
11010011
00011001
00101110
01110000
01000100
00001001
00100010
10011111
11010000
00101110
10011000
01001110
10001001
???
00100100
01101010
10000101
00001000
00010011
10001010
00000011
01110011
10100100
00111000
00101001
00110001
00001000
11111010
11101100
01101100
00111111
10001000
10100011
11010011
00011001
00101110
01110000
01000100
00001001
00100010
10011111
11010000
00101110
10011000
01001110
10001001
Methods
• What was done
• How it was done
• Instrumentation/Equipment (RASCAL
course)
• Limitations
Code
• All of the meanings
Description / Documentation
Labels (w/ units!)
• Codebook
• Data dictionary
• Laboratory notebook
00100100
01101010
10000101
00001000
00010011
10001010
00000011
01110011
10100100
00111000
00101001
00110001
00001000
11111010
11101100
01101100
00111111
10001000
10100011
11010011
00011001
00101110
01110000
01000100
00001001
00100010
10011111
11010000
00101110
10011000
01001110
10001001
There are standards
for documentation:
http://www.dcc.ac.
uk/resources/metad
ata-standards
C
d
π
Methods
• What was done
• How it was done
• Instrumentation/Equipment (RASCAL
course)
• Limitations
Code
• All of the meanings
Description / Documentation
Labels (w/ units!)
• Codebook
• Data dictionary
• Laboratory notebook
Document it:
Speaking of standards…
Document it:
Standards of scholarship & academia:
Plagiarism
?
Document it:
Standards of scholarship & academia:
Plagiarism
Cite
stuff!
dit
e
r
C
t
e
G
it
d
e
Cr
•
Data citation
e
v
i
G
Table of citation elements
1
2
A
Authors &
Contributors
4
Pd
Publication
date
3
T
Ei
Electronic ID,
e.g., DOI
Title
5
P
Publisher /
Distributor
- Track reuse
- Measure impact
- Support reproducibility
https://www.force11.org/group/jointdeclaration-data-citation-principles-final
CU-RDM@columbia.edu
Document it:
Citation managers
http://library.columbia.edu/research/citation-management.
html
Document it:
Citation managers
http://library.columbia.edu/research/citation-management.
html
Who owns it:
Intellectual Property & Ownership:
?????
You
Your PI
Columbia University
Publisher
Funding Agency
?????
How do you
store it?
Store it:
Secure it:
Security
Storage
Secure it:
How will you protect your or your
participant’s:
● Security
● Privacy/ confidentiality
● Intellectual property
● Other rights
?
Secure it:
Secure it:
Backups
Where
● Here
● Near
● Far
When
● Regularly & frequently
● Schedule it
Test it
● File recovery
● Checksums
Secure it:
Backups
What about
, you ask?
Read the fine print
(what happens to your stuff when the service
inevitably dies?)
Consider:
● Security
● Cost
● Accessibility ● Longevity
Secure it:
Security
Who needs to/should see the data when?
IRB
PII
FERPA
HIPPA
Copyright
Restricted Patent potential
Classified Licenses & IP
Consider:
● Restricting physical access
● Encryption
● De-identification
● Strong passwords (password manager)
Storing it: Some practicalities
● File formats
● File naming and organization
● Version control
Storing it:
File formats (for interoperability &
storage)
•
•
•
•
Non-proprietary
Open, documented standard
Standard representation (e.g., ASCII, Unicode)
Common, or commonly used by the research
community (e.g. FITS, CIF)
• Unencrypted
Not sure about the
extension?
• Uncompressed
✓
Some commonly recognized formats
meeting these criteria: ASCII [e.g., .csv, .txt],
PDF [.pdf], FLAC, TIFF, JPEG2000 [.jp2],
MPEG-4 [.mp4], XML [.xml, .odf, .rdf], R [.r]
Check https://www.
nationalarchives.gov.
uk/PRONOM/defaul
t.htm
http://www.data-archive.ac.uk/media/2894/managingsharing.pdf | http://www.digitalpreservation.gov/formats/index.shtml?
PHPSESSID=c26c5e5101396d5f5ebacedb13cae6e3
Storing it: File naming
Storing it: File naming
● Consistency: Pick a system, write it down, stick with it
● Identify necessary elements & consider their order
● Create brief, understandable names
● Date: YYYY-MM-DD or YYYYMMDD
● Version: v01, v02,…FINAL
● Try to stay away from spaces in filenames as well as
the following characters: \ / : * ? “. < > | [ ] & $ (reserve
. for file extensions)
● Recognize: At the file level,
Firefly/browncoat/shiny.txt = Firefly/alliance/shiny.txt
Make a system. Share the system. Follow the system
Storing it:
File organization
● Consider organizing by logical chunks, e.g. project,
class, grant
● What makes sense for the work you’re doing? How
are you likely to look for related items?
● Identify important elements & how they should be
nested
● Don’t make the system too deep
● Choose brief, understandable names
● Document it!
Make a system. Share the system. Follow the system
Storing it: Versioning:
Did you change the file?
Change the name!
Indicate versions
● filename_v001
Indicate responsibility
● Initials: file_v05_gh
● report_draft_r045
● ID designation:
file_v05_iam37
● report_final_r176
● presentation_20140706
Make a system. Share the system. Follow the system
Storing it:
Columbia resources
●
●
●
●
Lionmail drive
Academic Commons
The Libraries
Departmental IT
Sharing it:
But my PI told me to do it this way?
Sharing it:
File naming & organization
Collaborating on a complex
project?
Make sure to share and
agree on your naming,
organizational, and
versioning systems!
Make a system. Share the system. Follow the system
Research Information & Data (RID)
sharing:
● What: Unique, reusable, relevant data
: With whom: Your future self! Your
●
collaborators. Your research community.
The world. (mind restrictions, etc.)
● When: During the project with
collaborators. At pre-determined project
stages. At project completion.
● How: Data Publication
● Frequently required by funding
organizations
RID sharing – What?
Not all data should be archived or be kept for
the same time, or in the same way. Appraise
your data on the following principles:
● Relevance to research mission
● Historical or scientific value
● Uniqueness
● Reliability / Integrity / Usability of data
● Replicability, or lack thereof
● Cost of management and preservation
● Adequate available documentation
● Satisfaction of requirements
RID sharing – With whom
YOU
http://openarchaeologydata.metajnl.com/about/ , modified
RID sharing – When (depends on
whom)
e
h
t
f
all o
it me!
RID sharing – How, Data Publishing
● Data publication in repositories
○ Institutional: http://academiccommons.columbia.
edu/
○ Disciplinary, Directory: http://www.re3data.org/
○ Requirements
■ long-term storage and access to data
■ validation of data integrity [check-sum]
■ a permanent resource locator (e.g., DOI, Purl,
hdl) to make its data persistent, unique, and
citable
● Data descriptors
● Data papers
● Supplementary material
Using it:
Columbia resources
●
●
●
●
Open Source Software
Licensed Software
Specialized Software
High Performance Computing
Responsibility
Questions?
Contact us:
Ellie | Research Services Coordinator | ellie.ransom@columbia.edu
Amy | Research Data Manager | cu-rdm@columbia.edu