Data Carpentry develops and teaches workshops on the fundamental data skills needed to conduct research. Its target audience is researchers who have little to no prior computational experience, and its lessons are domain specific, building on learners’ existing knowledge to enable them to quickly apply skills learned to their own research. Participants will be encouraged to help one another and to apply what they have learned to their own research problems.

This workshop uses a tabular ecology dataset from the Portal Project Teaching Database and teaches data cleaning, management, analysis, and visualization. There are no pre-requisites, and the materials assume no prior knowledge about the tools. We use a single dataset throughout the workshop to model the data management and analysis workflow that a researcher would use.

The workshop will cover:

Lesson Overview
Data Organization in Spreadsheets Learn how to organize tabular data, handle date formatting, carry out quality control and quality assurance and export data to use with downstream applications.
Data Cleaning with OpenRefine Explore, summarize, and clean tabular data reproducibly.
Data Analysis and Visualization in R Import data into R, calculate summary statistics, and create publication-quality graphics.
Data Management with SQL Structure data for database import. Query data within a relational database.

Please note that the first two lessons are more introductory and are covered on the first day of the workshop. The third lesson is delivered over two days, while the last one is taught on the last day of the workshop.



Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on.

They are also required to abide by the ARCHER2 Code of Conduct.


09:00 - 17:00


Course materials


Session 1

Session 2

Session 3