Summary and Setup

Using computers to create, modify, and manage digital files is an essential skill for all researchers. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to create and organise data in a way that allows you to perform a proper analysis later. Establishing good data organisation practices early on will help you to get from research question to publication faster, more enjoyably, and with fewer headaches. The ability to revisit a project after a couple of months or years and find the files you need, make sense of your data, and understand why you made certain decisions during your analyses, is crucial for being able to answer questions from your funder, institution, or journal reviewer.

After this lesson, you will be able to:

  • Apply best practices for organising digital files in projects
  • Use tidy data principles when creating or entering data
  • Identify and address common formatting mistakes
  • Understand how to handle dates in spreadsheets
  • Utilise quality control features to keep data error-free
  • Effectively export data from spreadsheet programs

In this lesson you will not learn about in-depth data cleaning, data analysis, or plotting in spreadsheets because there are better tools available for these tasks (like Openrefine and R).

Data

Download this data file to your computer.

About the data

The data used across the Data Carpentry lessons is a simplified version of the Portal Project Database designed for teaching. This dataset contains observations taken from a small mammal community in southern Arizona as part of a project studying the effects of rodents and ants on the plant community. The study has been running for almost 40 years and the full dataset has been used in over 100 publications. The data we’re going to look at has been simplified a little bit for the workshop, but the full version is available to download if you’re interested.

Ernest, M., Brown, J., Valone, T., and White, E.P. (2017). Portal Project Teaching Database. Version 6. Figshare. DOI: 10.6084/m9.figshare.1314459.v6

Software

Because most researchers use Microsoft Excel, the examples and screenshots in this lesson reflect Excel. If you don’t have access to Excel, you can use LibreOffice instead. The commands and interface is a bit different but the general principles are the same.

Download LibreOffice

Windows:

  • Download the Installer
    • Install LibreOffice by going to the installation page. The version for Windows should automatically be selected. Click Download Version X.X.X (whichever is the most recent version). You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.
  • Install LibreOffice
  • Once the installer is downloaded, double click on it and LibreOffice should install.

Mac OS X:

  • Download the Installer
    • Install LibreOffice by going to the installation page. The version for Mac should automatically be selected. Click Download Version X.X.X (whichever is the most recent version). You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.
  • Install LibreOffice
  • Once the installer is downloaded, double click on it and LibreOffice should install.

Linux:

  • Download the Installer
    • Install LibreOffice by going to the installation page. The version for Linux should automatically be selected. Click Download Version X.X.X (whichever is the most recent version). You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.
  • Install LibreOffice
  • Once the installer is downloaded, double click on it and LibreOffice should install.