Cleaning Data

All Data Cleaning Resources

workbench logo
Workbench

A platform that allows journalists to scrape, clean, analyze and visualize data without coding. Use it to clean messy data, finding and fixing typos in seconds.

Screenshot of OpenRefine
Introduction to OpenRefine

An introduction to OpenRefine, a tool ideal for dealing with large and messy data sets. Learn to edit and transform cells.  Video 1 of 3.

Screenshot of OpenRefine running a RegularExpression
Regex on OpenRefine

An introduction to basic regular expressions in OpenRefine. Use these search patterns to edit your data in seconds. Video 2 of 3.

Screenshot of OpenRefine with Extract function
More OpenRefine: Extract, Apply, Cluster

Learn how to edit data massively using the clustering feature. See how to apply/extract commands to step backwards and forwards through your work, as well as apply it to new or revised data sets. Video 3 of 3.

Tutorial to merge to datasets on OpenRefine
Merging two tables in OpenRefine

Read this tutorial to learn how to merge two data sets using OpenRefine. 

tow logo
Introduction to Regular Expressions

Slides with a brief introduction to Regular Expressions, how they work, some grammar and examples. RegEx are useful to search for and match a pattern in your data and clean it up from there.

verification handbook
The Verification Handbook for Investigating Reporting

Chapter 5 of the handbook walks you through simple but effective steps to initially verify the quality of your data. Is the data complete? Are there duplicate records? There is no such thing as a completely reliable source when it comes to using data to make meticulous journalism.

Quartz Logo black background
The Quartz Guide to Bad Data

A guide by Quartz that delves into the kinds of problems that reporters encounter when working with (messy) data. Are values missing? Duplicated? Spelling inconsistent? This guide suggests solutions to these common issues.

ProPublica logo white background
The ProPublica Guide to Bulletproofing Data

A ProPublica guide that walks you through basic (and elaborate) checks for every dataset, from making sure you know how many records you should have to assuring you have them all.

OpenRefine logo
OpenRefine

A downloadable software that helps you sort and sift dirty data, cleaning it to the point where you can start your actual analysis

I WANT TO

video-camera icon

Watch all the videos

book icon

Read all the materials

th icon

Explore all the available tools