Dates as data
Last updated on 2024-04-15 | Edit this page
Overview
Questions
- What is a safe approach for handling dates in spreadsheets?
Objectives
- Describe how dates are stored and formatted in spreadsheets.
- Demonstrate best practices for entering dates in spreadsheets.
Dates in spreadsheets can be a problem. A spreadsheet application may display dates in a seemingly correct way (to a human observer) while actually storing the date in a way that may cause issues.
Note
Most of the images of spreadsheets in this lesson come from Microsoft Excel running on a Mac or Windows PC.
Excel is famous within the scientific community for turning things that aren’t dates into dates. For example, many protein or gene names such as MAR1, DEC1, and OCT4 will automatically be changed into dates by Excel. And once overwritten, the original value of the cell is lost. In fact, a global body in charge of setting nomenclature for genes released guidelines in 2020 which included “symbols that affect data handling and retrieval” as one of the criteria which should be accounted for, in a reference to ongoing issues with Excel. Microsoft subsequently released an update to Excel in 2023 that allows users to disable automatic data conversion of these kinds of data into dates. This serves as an important cautionary tale!
Note
To turn off automatic data conversion for dates in Excel:
- On Windows: File > Options > Data > Automatic Data Conversion > Untick “Convert continuous numbers and letters to date”
- On Mac: Excel > Preferences > Edit > Automatic Data Conversion > > Untick “Convert continuous numbers and letters to date”
Note: This will only stop Excel from turning text/number combinations such as MAR4, DEC8, etc into dates. It will not prevent Excel from changing the way dates are interpreted and displayed based on regional settings. See below for more information.
Recommended date format
To avoid ambiguity, dates should be written as
yyyy-mm-dd
. But Excel and other spreadsheet software tend
to transform this data into their own date formats, changing the
underlying data in the process.
For example, take the date 2024-01-13
. When you add this
value to a cell inside a new Excel spreadsheet (.xlsx
) and
save the spreadsheet in the same format, the date is overwritten to
13/01/2024
. Similarly, when you add this date to a cell
inside a new Excel spreadsheet (.xlsx
) and save the
spreadsheet in a plain text format (.csv
), the date is
again overwritten to 13/01/2024
. Even when you create a new
plain text spreadsheet (.csv
) and save the file in
.csv
format, the date is overwritten to
13/01/2024
. Even when you pre-format a column as ‘text’
(right click the column label > format cells > Text), and enter
the date, it will be stored correctly and display correctly. However,
when this spreadsheet file is opened back up again in Excel the date is
reinterpreted by Excel, converted to it’s internal format, and
overwritten as 13/01/2024
.
Why is Excel doing this? When data is imported into Excel, it will automatically identify cells containing dates and convert these into it’s internal date format. In the background, Excel stores dates as integers, and then picks a format to display them with based on your computer’s regional date settings.
While it is possible to stop this from happening by changing your
regional date settings, whenever someone else (like your supervisor or
colleague) opens and saves the file, the dates will be converted based
on their regional date settings, which means they are likely to end up
back in 13/01/2024
format.
The only guaranteed way to avoid this is to make different columns to record year, month, and day separately. You can then join the date components up later on in the software used for analysis. If you already have dates in a single column, you can split them up using some handy Excel functions.
Exercise
Pulling day, month, and year out of dates entered into a single column.
- Open the
dates
tab in our messy spreadsheet and you will see the ‘plot 3’ table from the2014
tab (that contains the problematic dates). - Create new columns for day, month, and year.
- Make sure the new columns are formatted as numbers, not dates, by selecting the columns, right-clicking them > format cells > number.
- Extract day, month, and year from the dates in the
Date collected
column into the new columns by using:
YEAR()
MONTH()
DAY()
You can see that even though we expected the year to be 2014, the year is actually 2015. What happened here is that the field assistant who collected the data for year 2014 initially forgot to include their data for ‘plot 3’ in this dataset. They came back in 2015 to add the missing data into the dataset and entered the dates for ‘plot 3’ without the year. Excel automatically interpreted the year as 2015 - the year the data was entered into the spreadsheet and not the year the data was collected. The spreadsheet program introduced an error into the dataset without the field assistant realising.
{alt=‘dates, exersize 1’ .output}
Key Points
- Treating dates as multiple pieces of data rather than one makes them easier to handle.