Distance Learning Module: From data to spreadsheets to visualizations
It isn’t hyperbole: Journalists today have access to more data than ever before. But exposing the stories buried in the numbers remains a challenge. From election results, budgets and census reports to Facebook updates and image uploads, journalists need to know how to find the important trends in data and shape them into compelling narratives.
Why spreadsheets are essential tools for understanding data
Here’s an example of a story emerging from data. Mc Nelly Torres, a journalist from the Florida Center for Investigative Reporting, pored over spreadsheets in front of her. They detailed Florida boating accidents from 2008 to 2011, using numbers provided by the Florida Fish and Wildlife Conservation Commission. While Torres already knew that her state ranked No. 1 in the nation for boating fatalities, she wanted to dig deeper.
Her analysis of the data uncovered an interesting pattern that led her to investigate not only boating deaths but also the multibillion-dollar recreational maritime industry, its campaign donations to members of the Florida Boating Safety Advisory Council and recent legislation that failed to address the dangerous reality behind the grim statistics. Her spreadsheets showed that the vast majority of deaths occurred among boaters 35 years and older–a group that was exempt from legislation mandating boating-safety instruction. The law that aimed to make boaters safer didn’t apply to those with the
highest fatality rates. Here was a story that needed to be told, and Torres told it on NBC6 in Miami in 2013.
Torres’ discovery of this story in a spreadsheet is just one example of the utility of data-analysis skills. Given all the data available to us today, spreadsheet skills are especially critical for journalists. A spreadsheet is a simple application but a powerful tool. It can reveal a pattern that fights conventional wisdom. It can confirm a hunch. It can highlight outliers–sharp deviations from the norm–that demand further investigation. As Torres’ NBC6 report demonstrated, a complex story of death, money, and politics can emerge from a simple page of numbers.
For more about how spreadsheets are integral to reporting, head over to the Investigative Reporters and Editors site and browse their posts in Behind the Stories.
In the upcoming unit on data journalism, you’ll learn to work with data using spreadsheets. You’ll learn how you can turn rows and columns of numbers into something that reveals important trends. We’ll cover the basics of spreadsheeting, but it is not meant to be a comprehensive survey of every spreadsheet function. We’ll teach you only what you need to know to get you up and running with your data. You’ll learn about data types and file formats. You’ll learn how to import numbers into spreadsheets and how to structure the data so you can manipulate and explore it. You’ll also learn how to look at and write about numbers responsibly. We’ll discuss normalization so that your comparisons make sense, and we’ll talk about percentage change, means and medians.
Don’t fret if you’re not good at math. Using spreadsheets doesn’t involve anything more than simple arithmetic and algebra. If you can add, subtract and use decimals, you’re in good shape.
We’ll also look at the sexier front-facing side of data journalism, which are the visualizations. The charts, graphs, and maps are the finished products of your explorations and analyses of the data. Developing data visualizations isn’t something that’s left for the graphics department, but a responsibility for the journalist to effectively communicate your story.
It’s important to remember that “data journalism” has two important sides to it–one being the exploratory side, where you use data to find patterns, trends, and stories, and the second being the explanatory side, where you use or visualize the data to communicate to your readers.
What is Data?
We can’t analyze data unless we know what we’re dealing with. But it’s tough to come up with a single definition of data because it’s such a broad term. (Perhaps it’s not unlike the definition of art: You know it when you see it.) Difficult or not, we need to try.
We all can agree that data is information about the “real world,” collected and recorded. Writing on Poynter.org, Troy Thibodeaux, the former editor of interactive newsroom technology at the Associated Press, said this about the discipline of data journalism:
Real data journalism comes down to a couple of predilections: a tendency to look for what is categorizable, quantifiable and comparable in any news topic and a conviction that technology, properly applied to these aspects, can tell us something about the story that is both worth knowing and unknowable in any other way.
This turns out to be a great working definition for data, because it defines data in terms of what we want to do with it rather than what it is. We want to quantify data: How many Americans are unemployed? We also want to categorize data: How many of the unemployed are women? How many are men? Finally, we want to make comparisons: How does the number of women unemployed now compare with the number of women unemployed a year ago?
Looking at Visualizations
Let’s look at some data visualizations and ask critical questions of them to see if they are successful. Examine Bloomberg’s visualization of Trump’s budget proposal:
The key to any visualization is that it should make COMPARISONS. Always ask yourself, “compared to what”? The Bloomberg budget visualization doesn’t simply present the quantities–so many million for this department and so many million for that department–but it makes smart comparisons. What are those comparisons? The graphs compare the budget from one department to the next, but more importantly, it compares the change in the proposed budget from its current budget. As readers and citizens, we want to know what department is getting more funding, and what department is getting less (here shown as green vs. red). Hence, the change is the most important metric to show. Critically, it’s not just the change in numbers, but its the change in percentage. The EPA is being defunded by 2.6 billion dollars, but the impact is clearer when we understand that that change is 31.4% of its total budget. (Percent change is measured along the horizontal axis) Which department’s budget is seeing the greatest percentage increase? Which department’s budget is increasing the most in terms of dollars? Does the visualization make answering these questions an easy task?
The next visualization is from the Guardian, which presents the 2015 data for the Congressional representatives across the nation.
Successful visualizations don’t just let you compare, but they also let you personalize the data. How does the data relate to me? We call this “navel gazing” because we assume readers are most interested in their own situation (aka, they like to look at their own belly buttons). The Guardian’s interactive lets you choose categories (gender, race, education, etc.) to reveal who the folks in Congress are “most like you”. Does this visualization encourage personal exploration of the data?
Finally, explore the 3D interactive visualization of arms import and export, created for Google.
The interactive is an amazing display, allowing you to spin the globe and choose any country to show the amount of small arms import and export to and from other countries. But step back from the bells and whistles and try to identify what the reader would most likely want to do with this information. We would want to make comparisons. Here’s a question: I want you to compare the exports between China and the United States. Is it easy? Click on the U.S. and you see that the Export bar is half way up at 0.61 billion. Now click on China and you see that their Export bar is all the way to the top but at 58.1 million. Does this make it easy to compare? Visually, the orange bars aren’t consistent because the scale between countries varies (why is the higher number a smaller bar?). We must instead rely on the actual numbers. But if the actual graphics in a data visualization doesn’t help the reader, what good is it?
Your distance learning class assignment is the same as your homework assignment, which will be due next class: Pick one data visualization and critique on your blog with a 300-word post: is the graphical presentation effective at communicating the information? What is the takeaway message? Does it encourage exploration? Is it misleading? How are colors used? Does the form afford accurate comparisons?