What is data and why should you care? Dr. Kalpana Shankar
Transcription
What is data and why should you care? Dr. Kalpana Shankar
What is data and why should you care? Dr. Kalpana Shankar School of Information and Library Studies, UCD 5 November 2012 What do Apollo 11, the Domesday Project, and award winning scientists from the US National Science Foundation have in common? What is research data? “The data, records, files or other evidence, irrespective of their content or form (e.g. in print, digital, physical or other forms), that comprise research observations, findings or outcomes, including primary materials and analysed data.” – Australian National Data Service Examples: •Statistics and measurements •Results of experiments or simulations •Observations e.g. fieldwork •Survey results – print or online •Interview recordings and transcripts •Images, from cameras and scientific equipment What is ‘data’? Any information you use in your research “The whole thing is incredibly dull.” Why are we talking about data management? “PhD students lose material all the time…and they are exactly the people who want to be backing up. These are people who are creating data which are life and death important to them” Rising volume and complexity of research data • According to the European Bioinformatics Institute, the volume of new biological data is doubling every 5 months • For example, in genomics: – – we can now analyse the equivalent of a human genome every 14 minutes at a cost of $5,000 - 400 times quicker than when the draft human genome was first published in 2000. 1,000 Genomes Project: 200 terabytes — the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs A hard drive after 6 years’ research Image by Lindsay Lloyd-Smith 113 Gb 42,699 Files 3,466 Folders So, why is data management important for research? • It is increasingly integral to all areas of research • It is a rapidly escalating issue • It is important to research funders – likely to be increased follow-up in the future • It has major resource implications – which need to be planned for carefully • In short, it creates major challenges which aren’t going to go away! Why data management is important to YOU (II) What would happen to your data if there was a fire or theft in your office, department or home? “Fire” by andrewmalone via flickr.: http://www.flickr.com/photos/andrewmalone/2032844649/ Writing a Data Management Plan 1. Formalises the definition of your research data 2. Documents the contextual and technical details of your data 3. Check on File Structure / Naming 4. Plans for data sharing, access, and archiving Getting started • Your Data Management Plan won’t be perfect • It is not a static document – Change and update it as your research progresses and you understand more about your data • Think about key issues that might affect your data… o …while you work on them o …in the future • It’s better to have a plan that covers some aspects than no plan at all • Ask for advice if you’re uncertain Questions to ask yourself • Platform: Windows, Macintosh and/or Unix ? • Objective: Store? Manage? Share? Publish? • Extent of collaboration – Your research group/lab only – Your group + externals – Cast of thousands? • Nature of data? – Level of security? – Human records (de-identified)? – Intellectual Property? • Amount of data? MB? GB? TB? – Rate of accumulation of data? – How much needed online to do useful work? – Period of preservation? By twechy (Flickr ID): “Library Bookshelf” http://www.flickr.com/photos/twechy/6829994084/ CC BY 2.0 By Anne (Flickr ID: I like): “Voltaire & Rousseau” http://www.flickr.com/photos/ilike/2616342739/ CC BY-NC-ND 2.0 Give your data a structure… …it makes it easier to find things Something to try: Use post-it notes to create a map of your file structure • • • • Write each existing file and folder name onto a post-it Arrange folders on your desk in a sensible hierarchy Put your ‘files’ into ‘folders’ Do you need new folders? Do you have too many? What’s in a name? • Names tell us what a file is (contextual information) • Use a combination of different types of information to make context and content clear, eg – Author (or Initials) – Date – Data source – Theme – Experiment – Sample • …But try not to let file names get too long Why create documentation? • Creating documentation might seem like a waste of time • Good documentation will include a lot of information that might seem obvious www.flickr.com/photos/smutjespickles/2434418686/ Document your data as you go If you don’t, it may become impossible for you – or someone else – to understand and re-use data later on Question Mark Sign by Colin_K on flickr: http://www.flickr.com/photos/colin kinner/2200500024/ Make research material understandable What’s obvious now might not be in a few months, years, decades… Image: http://www.flickr.com/photos/archer10/5692813531/ MAKE SURE YOU CAN UNDERSTAND IT LATER Make research reproducible • Detailing your methodology helps people understand your research better • Explaining your algorithms, search methods etc makes your work reproducible • Conclusions can be verified Image by woodleywonderworks on flickr: http://www.flickr.com/photos/wwworks/4588700881/ Make material reusable • Material may be reused by someone in a different discipline • Provide context to minimise the risk of it being misunderstood/ misused Backing up • Lots Of Copies Keeps Stuff Safe (LOCKSS): make multiple back-ups • Keep back-ups in a separate place to the original • Use different types of storage media, eg CDs, pen drives, networked storage, external hard drive From: “Copy Copy Copy” by David Goehring (CarbonNYC) via flickr For everything you keep…. Make sure you can: • find it again later • understand later Where to get help • Earth Institute will be putting up links on Website • Your supervisor • Library • Funding agencies • Earth Institute will be putting up links on Website Oh yes…what do Apollo, the Domesday Project, and award winning scientists from the US National Science Foundation have in common? Questions? • My contact information: – Kalpana Shankar (kalpana.shankar@ucd.ie)