What do data journalists do?

Data journalism is at the core of PublicSource’s mission - to dig deeper and write in-depth stories. I’m Eric Holmberg, one of the newsroom’s newest members, and my job is to help our reporters obtain records and analyze data for their stories.

But what does that mean?

Federal, state and local governments, especially in Pennsylvania, don’t make their information as accessible as we would like it to be. Often, these agencies are still trying to manage paper records. When they do have electronic records, the information may not be kept in a format that makes it easy for the government agency to work with or for the public to dissect. That’s where data journalists come in. We help solve problems by turning messy records into clean data that can be used to reach meaningful conclusions.

For example, the Pittsburgh Bureau of Police puts neighborhood crime data into their annual reports, which is online as a PDF -- essentially a snapshot of the crime numbers and names of neighborhoods.

If you had a really sharp eye, maybe you could scroll through the pages and figure out which neighborhood had the most crime in 2012. But what about compared to the previous year? Where did crime increase or decrease? At this point, you’d have to break out a pencil and paper.

Or, the way we did it was to use Tabula, a tool that extracts information from PDFs, and Microsoft Excel to turn the city’s 2010, 2011, and 2012 crime statistics into spreadsheets. We also figured out the crime rate per 100 residents in each neighborhood, so the information wasn’t biased against the bigger neighborhoods. (Updated information on crime in 2013 will be available later this month. Come to the PublicSource website to see it at PublicSource.org.)

We put our data online - free - and write guides for our readers so they aren’t lost when they look at it. Readers can go through our process for obtaining the data step-by-step and do it themselves. This is what it means to fulfill a public-service mission. If local or state governments aren’t willing or able to provide data, collected and maintained using taxpayer dollars, in an easy-to-use format, then we will do that work for them on behalf of our readers.

The Pittsburgh City Council uploads invoices they approve on a weekly basis, as PDFs, to the City’s website. You may begin to notice a pattern: PDFs are not the best way to make government data useful for taxpayers. If you wanted to answer questions such as, who received the most money from these City Council approvals, you couldn’t. We ran the archive of PDFs through CometDocs, another conversion program and combined the resulting Excel files into a spreadsheet. Now, we can share the data with you in a format that you can use and analyze.

And we’re just getting started!

In these two short examples, you can see the value in having the tools and knowledge to unlock information that taxpayers have a right to know about in the first place. The information can lead us to interesting stories or provide context to larger issues. We can draw unseen connections between databases that don’t normally talk to each other and find the stories that haven’t been reported.

Despite all the new ventures into data journalism - Nate Silver’s 538, Ezra Klein’s Vox, The Upshot at The New York Times - what we’re doing isn’t new.

An excerpt from the Data Journalism Handbook:

“For example, Philip Meyer tried to debunk received readings of the 1967 riots in Detroit — to show that it was not just less-educated Southerners who were participating. Bill Dedman’s “The Color of Money” stories in the 1980s revealed systemic racial bias in lending policies of major financial institutions. In his “What Went Wrong,” Steve Doig sought to analyze the damage patterns from Hurricane Andrew in the early 1990s, to understand the effect of flawed urban development policies and practices.”

Data journalism isn’t new, but it is growing. Nearly 1,000 people attended The National Institute for Computer-Assisted Reporting conference in Baltimore this year, including six employees from PublicSource.

Across the newsroom, we’re gaining the data skills to unlock information for our readers. Our mission is to be a public service to news consumers in western Pennsylvania and across the state. Obtaining data and providing it to our readers is just one way we attempt to fulfill that promise.

Reach Eric Holmberg at 412-315-0266 or at eholmberg@publicsource.com to tell him what data stories you’d like to see.