Working with data at a small newsroom

Working in a small team has its ups and downs.

Fewer people could mean less potential for miscommunication. But it could also mean a greater potential for delays because on a small team, each person takes on multiple roles.

Here at PublicSource, in addition to working on our daily and in-depth stories, we’re each working on one or more investigative projects. We have no visuals team, no app development team, no data team. We have one team. And we do it all.

And while that doesn’t mean we are without structure, it does mean that our structure is fluid, changing based on project and daily workload.

Here are some of the ways we decide how to structure our projects:

Communication

Projects are like relationships. What is the one thing that is most crucial in a productive relationship? Communication. You might think your teammates know where you’re at, how you’re feeling, that you’re struggling, but if you don’t tell them, they have no reason to know.

Roles

We don’t have the manpower to have different people for different teams; instead, we use more of a Venn diagram approach to role assignments. Based on the data involved and the end product desired, people generally fall into these roles:

  • data acquisition: usually assigned only to the lead reporter on the project

  • data conversion (paper to digital usually): usually assigned to our data reporter and lead reporter

  • data cleaning: all hands on deck

  • data processing: usually assigned to our data reporter and newsroom developer

  • data analysis: all hands on deck

  • app development: usually assigned to the newsroom developer

  • reporting: usually assigned to the lead reporter and data reporter

Sharing and organizing (data, files, notes, etc.)

Decide how you’re going to share and organize your data. There are several options here, and it really depends on what everyone is comfortable using. Personally, I think version-control applications like Github are the way to go, but sometimes we need to be flexible because not everyone knows the same systems and there isn’t always time for them to learn.

And truthfully, it’s not all about the tools you’re using to get the thing done. It’s about consistency and understanding. As long as everyone on your team is on board with the process you’re using (and the process isn’t actually riddled with flaws), go with it.

Timelines, based on time not on calendars

Timelines that start out on a calendar are doomed to fail. Dates mean nothing when you’re talking about cleaning, processing and analyzing. Minutes, hours and days, however, are appropriate units of measure. “It will take me 2 days to clean this dataset” is much more concrete than, “I will have this data clean by the 15th.”

Every part of a project depends on the process that comes before it. As soon as one part surpasses its deadline, your calendar-based timeline is broken.

In a time-based timeline, if one part takes longer than expected, the other parts remain whole and unbroken, capable of being shifted and moved.

So lay out your project in time first, and then toss it up on a calendar to see where you’re falling. If you see the project is going to take you more time than you have, think about what you really need out of this project and refine.

Never process data that you don’t understand.

As soon as you get a dataset, look at it. If you don’t understand what you’re looking at, write down all the questions you can think to ask and talk with the person who gave you the data.

Create a term dictionary that everyone uses. Never assume that your teammates are using the same terms to describe the dataset as you. Here’s a template for the term dictionary we use.

Never make assumptions about column names. And never run processes on data that you’re not familiar with.

If you don’t get answers to your data questions, think about whether you can, in good conscience, report any solid findings that may come from processing it.

Document everything you do ad nauseam.

Consider documentation a form of communication, which we’ve already established is the most crucial aspect of any project.

In addition to saving you time down the line (when you need to re-run the process you just took 12 hours figuring out), taking detailed notes about everything you do will make you think about the steps that you’re taking.

Someone once told me that if, at the end of a project, you don’t have page after page of process notes, you’ve done something wrong.

Be language agnostic

There are hundreds of programming languages, frameworks and libraries available to you, not only in the app-building process, but also every step along the way. Just because you know how to use Excel, don’t think you can’t also use R.

Alexandra Kanik is the web and interactive developer for PublicSource. You can reach her at akanik@publicsource.org or follow her on twitter @act_rational.