Introduction

Last updated on 2024-04-08 | Edit this page

Overview

Questions

  • What are some basic principles for organising research files?

Objectives

  • Describe best practices for keeping research projects organised.

Project Organisation

Organising digital files into projects and managing them effectively are essential skills for researchers, but are seldom taught at university. The following list of practices are a useful starting point for you to adapt to your own needs.

  1. Use sensible folder and file names

Each research project should have it’s own project directory (folder) containing files organised into a consistent set of subfolders. Each discrete set of data resulting in a publication may have it’s own project folder, or perhaps each grant has it’s own project folder containing the datasets and manuscripts generated as part of the overall project.

Every project will have different needs, but using a set of common subfolders means you will always know where certain files live. For example:

  • data_raw: Keep raw, unedited data files here.
  • data_clean: Keep cleaned/transformed datasets here.
  • figs: Keep figures and tables here.
  • docs: Keep manuscripts, lab notebooks, and other documents here.
  • scripts: Keep scripts or code documents here.

Like folder names, filenames should:

  • Be short, descriptive, and consistent.
  • Avoid special characters and spaces.
  • Conform to a schema or template.

For example, 2020-01-20_bird-counts_north.csv contains a date, subject, and location in three fields separated by underscores. Files will sort based on date because it is the first field.

  1. Back up your files

Ensure you’re not the victim of data loss by backing up your research projects, or better yet, store them on a platform or service that is automatically backed up for you. The University of Auckland provides Research Drive and Dropbox team folders for researchers and postgraduate research students. Both are backed up automatically. See here for more information.

  1. Keep a copy of the raw data

Keep a copy of all raw data files, whether they are generated by an instrument, software package, or transcribed by hand. It is essential to maintain the provenance of data in order to respond to questions from funders, institutions, and journal reviewers, should they arise. Having a copy of the raw data also means you can easily recreate any figures or results.

If you need to manually tidy or transform the data, it’s best to make your changes to a copy to preserve the original raw data. But limit the number of copies you make so as not to create more work having to update data in multiple locations.

  1. Create ‘tidy’ spreadsheets

Tidy spreadsheets are those where:

  • Each variable has its own column.
  • Each observation has its own row.
  • Each cell contains a single value.

The idea is to ensure data is structured in a way that computers are expecting, in order to use programming languages like R or Python later on. We’ll cover this more in the next episode.

  1. Describe your project with a README

A README is a plain text file (.txt) that lives in the project folder and contains as much information about your project as you think would be needed for another person (or your future self) to understand it. Suggested content includes:

  • Title, abstract, authors, funders
  • What the folders and files contain and how they relate to each other
  • How data was generated, instrument/software settings
  • Column names and units
  • How data has been changed/transformed (if done manually)

Exercise

Have a go at creating a README for the project/data you’re currently working on.

  1. Create a .txt file.
    • Windows: Right click the Desktop > New > Text Document > rename to README.txt
    • Mac: Finder > Applications > Open TextEdit > Click New Document > Click the Format tab > Click ‘Make Plain Text’
  2. Enter the following details:
    • Title
    • Abstract (just one sentence for now)
    • Authors
    • The name of one of your main spreadsheet files
    • The name of 3 columns/variables in your spreadsheet, a brief description, and their units
  3. Save the file.
    • Windows: Ctrl+S
    • Mac: Cmd+S and rename as README.txt

See here for more information about READMEs.

Lesson structure

Many researchers commonly work with data in spreadsheets, so the rest of this lesson focuses on tidying tabular data:

  1. Formatting data tables in spreadsheets
  2. Formatting problems
  3. Dates as data
  4. Quality control
  5. Exporting data

Key Points

  • Good project organisation encompasses the naming, arrangement, backing up, and documenting of files.
  • Time invested into project organisation is paid back multiple times over during a project.

For a more expansive view of good project organisation, see Good enough practices in scientific computing by Greg Wilson and colleagues, published in PLoS Computational Biology.