Summary and Schedule
Using computers to create, modify, and manage digital files is an essential skill for all researchers. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to create and organise data in a way that allows you to perform a proper analysis later. Establishing good data organisation practices early on will help you to get from research question to publication faster, more enjoyably, and with fewer headaches. The ability to revisit a project after a couple of months or years and find the files you need, make sense of your data, and understand why you made certain decisions during your analyses, is crucial for being able to answer questions from your funder, institution, or journal reviewer.
After this lesson, you will be able to:
- Apply best practices for organising digital files in projects
- Use tidy data principles when creating or entering data
- Identify and address common formatting mistakes
- Understand how to handle dates in spreadsheets
- Utilise quality control features to keep data error-free
- Effectively export data from spreadsheet programs
In this lesson you will not learn about in-depth data cleaning, data analysis, or plotting in spreadsheets because there are better tools available for these tasks (like Openrefine and R).
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Introduction | What are some basic principles for organising research files? |
Duration: 00h 15m | 2. Formatting data tables in Spreadsheets | How do we properly format data in spreadsheets? |
Duration: 00h 50m | 3. Formatting problems | What are some common challenges with formatting data in spreadsheets and how can we avoid them? |
Duration: 01h 10m | 4. Dates as data | What is a safe approach for handling dates in spreadsheets? |
Duration: 01h 23m | 5. Quality control | How can we carry out basic quality control and quality assurance in spreadsheets? |
Duration: 01h 43m | 6. Exporting data | How can we export data from spreadsheets in a way that is useful for downstream applications? |
Duration: 01h 53m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Data
Download this data file to your computer.
About the data
The data used across the Data Carpentry lessons is a simplified version of the Portal Project Database designed for teaching. This dataset contains observations taken from a small mammal community in southern Arizona as part of a project studying the effects of rodents and ants on the plant community. The study has been running for almost 40 years and the full dataset has been used in over 100 publications. The data we’re going to look at has been simplified a little bit for the workshop, but the full version is available to download if you’re interested.
Ernest, M., Brown, J., Valone, T., and White, E.P. (2017). Portal Project Teaching Database. Version 6. Figshare. DOI: 10.6084/m9.figshare.1314459.v6
Software
Because most researchers use Microsoft Excel, the examples and screenshots in this lesson reflect Excel. If you don’t have access to Excel, you can use LibreOffice instead. The commands and interface is a bit different but the general principles are the same.
Download LibreOffice
Windows:
- Download the Installer
- Install LibreOffice by going to the installation page. The version for Windows should automatically be selected. Click Download Version X.X.X (whichever is the most recent version). You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.
- Install LibreOffice
- Once the installer is downloaded, double click on it and LibreOffice should install.
Mac OS X:
- Download the Installer
- Install LibreOffice by going to the installation page. The version for Mac should automatically be selected. Click Download Version X.X.X (whichever is the most recent version). You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.
- Install LibreOffice
- Once the installer is downloaded, double click on it and LibreOffice should install.
Linux:
- Download the Installer
- Install LibreOffice by going to the installation page. The version for Linux should automatically be selected. Click Download Version X.X.X (whichever is the most recent version). You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.
- Install LibreOffice
- Once the installer is downloaded, double click on it and LibreOffice should install.