Teaching Data Journalism logo

Click the the image to read our entire series. Public domain image via DARPA

Alexander B. Howard is a writer and editor based in Washington, D.C. He is currently a fellow at the Tow Center for Digital Journalism at Columbia University and recently produced a report on data journalism, “The Art and Science of Data-Driven Journalism” (PDF). EducationShift talked with Howard about the report and what educators should take away from it.

Q&A

In general, what can educators do to better prepare students to use and analyze data?

Alex Howard: I think it starts early on when giving people grounding in how science works — physical and biological sciences. I had a very fortunate experience to have some exposure to that fairly early in my education. My observation when comparing and contrasting the humanities and sciences — I was a double major — is that you often are using data in the context of experiments, in trying to understand the why of something and what data exists to help us understand. That gives you a different kind of pursuit of knowledge than the classic journalist approach, whereby you go out and talk to people and then you have a story, but it might not give you a broader perspective about the number of times this issue happens or where — in other words, context.

This is not a fresh or novel idea. Journalists have been looking for data about these things as part of their stories for many decades. Data journalism is not a novel idea to investigative reporters who have been diving into statistics to try to give readers context and understanding.

You can think about how a classroom might be thinking through the use of data. In any given question, does data exist to help discuss the topic in a more knowledgeable way with more context?

In any given field of study, there will be a body of knowledge that’s preexisting and buried in there will be data. I think the important thing for educators to consider is how the class and students are taught to think about this. It will have an impact on how much of an evidence-based approach students have to answering questions. The same way you ask them to find and research primary sources — what is or isn’t a good source — they will have to do the same for data. With a very powerful set of data, you can often explain the root causes for something in a way you might not ever be able to do just through interviewing witnesses.

Which new, low-cost digital tools can journalism classes incorporate to increase data literacy?

Howard: It starts with spreadsheets. There are online spreadsheets now that can enable people to manipulate large amounts of data fairly powerfully through Google Spreadsheets or OpenRefine. Maps are quite powerful. Once you’ve cleaned up the data, you can use maps to make differences in populations using Datawrapper or Tableau. If students have more computing experience — basic coding skills — making animations using other tools may be relevant.

Photo courtesy of Flickr user Anna Lena Schiller and used here by Creative Commons license.

Photo courtesy of Flickr user Anna Lena Schiller and used here by Creative Commons license.

How can educators connect more traditional journalism principles to this growing and changing field of data journalism?

Howard: I don’t think the principles are different. The field is changing dynamically, but the underlying ethics don’t change simply because we can collect large amounts of data, analyze them and put out a statistically driven application to describe something. The ethics of publishing are the same. Are they making sure to protect private information, not engage in libel or slander? The technology rushing forward does not mean that the underlying ethics that have guided people’s decisions to gather information and publish it are somehow changed. It’s critical that data journalism, even though it’s a hot, new topic, not be divorced from decades of computer-assisted reporting or investigative journalism. These are new tools, new techniques, new opportunities and there are new risks that go along with them, but the ethics of creating knowledge from data aren’t fundamentally divorced from the ethics of creating knowledge from talking to people as sources. You may need to protect the data — its provenance or its sourcing. You may need to secure it if it’s sensitive data.

I don’t think there are absolutes here, but these are issues that aren’t separate because they’re online. They are very much grounded in the same kind of decisions that editors have been making for a very long time. Social journalism, digital journalism, data journalism—it’s all journalism and journalism has long been associated with certain ways of doing things.

In your report, you mentioned that many journalists are learning data tools outside of their degree or formal education. Why do you think this is?

Howard: I think it has been driven by traditional institutions not offering the classes for getting those skills. It’s also grounded in not having people on staff that are practitioners in these areas. If you want to learn a given thing, it may be that the only resource is to go online to buy a book or go to a local chapter of Hacks & Hackers or some other meet-up where you can do peer-to-peer knowledge exchange.

In the report, I saw that journalism schools are starting to add more classes and hire more people who have these kinds of skills and this kind of mindset. It’s not just about the hard skills. It’s also about computational thinking — thinking about data as a strategic resource.

There’s a real challenge around digital literacy in traditional print journalism. If you look around journalism schools, there are a lot of people who are from that side or from broadcast journalism, but they may not have the grounding in how to go about doing these kind of data stories. I do think that there’s a lot of very talented people who are moving into the academy who are creating new classes and integrating this kind of work throughout the curricula — Meredith Broussard at Temple, Cheryl Phillips at Stanford — but there’s many more journalism schools around the country and just because the top 10 are moving forward on creating capacity to teach these kinds of things, that doesn’t mean it’s available elsewhere. And where it isn’t, if people want to get this experience, there are now options. There’s a gigantic data journalism MOOC that 21,000+ people are participating in. It’s an example of where people can go to learn to get these kinds of skills.

Do you expect this to change? By what means?

Photo of Alexander B. Howard, Tow Center

Alex Howard

Howard: I think it’s going to happen. I don’t think there’s really any question about it. In the same way that a young person is expected to use a computer, they’ll also need to open up a spreadsheet and do basic statistical analysis. They’ll need to be able to understand the N value of a study and to know what someone is talking about with R values and regression. They’ll need to have some literacy around maps and charts and infographics and ways to present information and visualize data. Just in the same way young journalists are learning how to create basic webpages, how to take pictures, how to use mobile devices, shoot video and create basic apps, these are tools that are going to become part of the ways that 21st century journalists practice their craft. To not use one of the tools is to be unable to practice part of the craft, as it is currently being defined and expanded. It’s not to say that you can’t have a specialization in one area that means you are very good at a certain part of the process, but just being able to write isn’t going to be enough unless you have subject matter expertise in a given area, whereby the expertise gives you the ability to do analysis in that context.

Do you see robo-journalism as a threat to field of journalism?

Howard: It’s a complement. To be clear, robo-journalism is a buzzy term; it makes people think of robots. I like that kind of phrase if it attracts people to something and makes them engage with the underlying idea, which is an algorithm that is programed to write a story when a given event occurs.

An example I gave is an earthquake bot where an alert from the U.S. Geological Survey is pulled into a simple, two-paragraph story.

That kind of commodity news generation, along with a human editor or other check and balance, is going to become an increasing part of the news world. It’s going to enable people to cover things they might not otherwise.

As the software capacity gets better and there is more expertise around creating this kind of approach to news, then it will steadily move up the ladder into more complex stories. What that means is not that it is a threat to journalism, per se, it means that humans have to focus on creating value, writing stories, covering things in a way that the robot can’t. They must write better, find meaning in images and think about how they’re forming narratives that reach into what something really means. I think that actually might be good for journalism. If that puts pressure on humans to do better, then that’s probably good market pressure.

It’s pretty clear that there are a lot of technical means that are encroaching to traditional news places, making it very important for people going into the field to think about why what they do is special if they want to stay in it.

What are three main data competencies students must have before they graduate?

Howard: Acquisition, cleaning and presentation. Finding the data is a core competency. Knowing where data exists and where you can download is a great first step. The next step is cleaning. Most people I speak to in the field say this tends to be the most boring and least sexy thing, but to do this well, you have to do it. You have to make sure you have that set because otherwise, whatever comes out of it may not be of high quality. It may even be wrong.

 

Photo courtesy of Flickr user Mirko Lorenz and used here by Creative Commons license.

Photo courtesy of Flickr user Mirko Lorenz and used here by Creative Commons license.

Lastly, there is presentation. You need to be able to understand how to make data into something that humans will engage with. You must design it to be something that enables people to explore the data. Present it in a map or some other form that gives people a way to see it in context. Make it into an application or service that means people maybe a couples layers away from the data itself, but can make decision based on it.

Presentation may go quite tightly with personalization, in that if you are always thinking about the reader or user, if you can think about why they’re there and what they want to accomplish, you’ll make them happier and solve a problem.

Could you list one or two top tools they should be proficient in, as well?

Howard: Know how to use spreadsheets. Learn how to think computationally – not necessarily a tool, but a way of thinking, as described in Tasneem Raja’s article “Is Coding the New Literacy?” in Mother Jones.

Meagan Doll is a junior at the University of Wisconsin-Madison studying journalism. She is an intern at the EducationShift section at PBS MediaShift.