CAR

Fifty Years of Journalism and Data: A Brief History

By Brant Houston | November 12, 2015

Photo: jwyg in Flickr (CC License).

It started with trying to predict the outcome of a US presidential election. More than six decades later, computer-assisted reporting is at the core of investigative reporting globally.

Many practitioners date the beginning of computer-assisted reporting and data journalism to 1952 when the CBS network in the United States tried to use experts with a mainframe computer to predict the outcome of the presidential election. That’s a bit of a stretch, or perhaps it was a false beginning because they never used the data. It really wasn’t until 1967 that data analysis started to catch on.

In that year, Philip Meyer at The Detroit Free Press used a mainframe to analyze a survey of Detroit residents for the purpose of understanding and explaining the serious riots that erupted in the city that summer. (Decades later The Guardian in the United Kingdom used some of the same approaches to look at racial riots there and cited Meyer’s work.)

Meyer went on to work in the 1970s with Philadelphia Inquirer reporters Donald Barlett and James Steele to analyze sentencing patterns in the local court system, and with Rich Morin at The Miami Herald to analyze property assessment records. Meyer also wrote a book called Precision Journalism that explained and advocated using database analysis and social research methods in reporting. (Several revisions of the book have been published since then.)

Still, only a few journalists used these techniques until the mid-1980s, when Elliot Jaspin in the U.S. received recognition at The Providence Journal Bulletin for analyzing databases for stories, including those on dangerous school bus drivers and a political scandal involving home loans. At the same time, about 50 other journalists across the U.S. in the late 1980s, often consulting with Meyer, Jaspin, or Steve Doig of the Miami Herald, began doing data analysis for their stories.

Aiding their efforts were improved personal computers and a program—Nine Track Express—that Jaspin and journalist-programmer Daniel Woods wrote to make it easier to transfer computer tapes (that contained nine “tracks” of information) to personal computers using a portable tape drive. This allowed journalists to circumvent the bureaucracies and delays involved in using mainframes at newspapers and universities.

In 1989, the U.S. journalism profession recognized the value of computer-assisted reporting when it gave a Pulitzer Prize to The Atlanta Journal-Constitution for its stories on racial disparities in home loan practices. During the same year, Jaspin established at the Missouri School of Journalism what is now known as the National Institute for Computer-Assisted Reporting (NICAR). Then, in 1990, Indiana University professor James Brown held the first computer-assisted reporting conference in Indianapolis.

In the 1990s through early in the 21st Century, the use of computer-assisted reporting blossomed, primarily due to the seminars conducted at Missouri and worldwide by Investigative Reporters and Editors (IRE) and NICAR, which is a joint program of IRE and the Missouri School of Journalism. This was aided by the publication of my book in 1996, the first on doing CAR, Computer-Assisted Reporting: A Practical Guide,” now in its 4th edition.

The early years of the 21st century saw the Global Investigative Journalism Network begin to play crucial part in the movement, starting with its first conference in 2001 in Copenhagen that offered a strong computer-assisted reporting track and hands-on training.

NICAR Begins

In 1994 NICAR was created, and training director Jennifer LaFleur and I initiated an ambitious on-the-road program that eventually included up to 50 seminars a year. By 1996 word of the U.S. successes had reached other countries, and foreign journalists began attending the “boot camps” (intense, week-long seminars) at NICAR. In addition, IRE, with the support of the McCormick Foundation, had set up a program in Mexico City that oversaw data training in Latin America.

While journalists outside the U.S. first doubted they could obtain data in their own countries in 1990s, the training showed them how international or U.S. data could be used initially for stories in their countries, how they could build their own datasets, and how they could find data in their own countries.

As a result of the training efforts, by 1999 journalists had produced stories involving data analysis in Finland, Sweden, New Zealand, Venezuela, Argentina, the Netherlands, Norway, Brazil, Mexico, Russia, Bosnia and Canada.

Meanwhile, in London in 1997, journalism professor Milverton Wallace began holding an annual conference called NetMedia that offered sessions on the Internet and classes in computer-assisted reporting led by NICAR and Danish journalists. The classes covered the basic uses of the Internet, spreadsheets, and database managers, and they were well-attended by journalists from the UK, other European countries, and Africa.

In Denmark, journalists Nils Mulvad and Flemming Svith, who had gone to a NICAR boot camp in Missouri in 1996, organized seminars with NICAR in 1997 and 1998 in Denmark. They also wrote a Danish handbook on computer-assisted reporting, created the Danish International Center for Analytical Reporting (DICAR) in 1998 with Tommy Kaas as president. They also co-organized the first Global Investigative Journalism Conference with IRE in 2001.

CAR also became a staple of conferences in Sweden, Norway, Finland, and the Netherlands, with Helena Bengtsson from Sweden and John Bones from Norway.

Through the global investigative conferences, the use of data also quickly spread across Eastern Europe. In Eastern Europe, Drew Sullivan (who formed the Organized Crime and Corruption Reporting Project) and Romanian journalist Paul Radu were strong proponents and organizers.

Seminars also were given initially in China through the University of Missouri and in India through the World Press Institute. During same period Steve Doig, a pioneer in CAR and now the Knight Chair in Computer-Assisted Reporting at Arizona State University, traveled internationally to teach CAR, as did additional NICAR training directors — Jo Craven McGinty, Tom McGinty, Ron Nixon, Andy Lehren and Sarah Cohen – all now journalists at either the New York Times or the Wall Street Journal.

Visualization of Data Increases

In 2005, the visualization of data for news stories got a big boost when U.S. programmer Adrian Holovaty created a Google mash-up of Chicago crime data. The project spurred more interest in journalism among computer programmers and in mapping. Holovaty then created the now-defunct Every Block in 2007, which used more local data for on-line maps in the U.S., but the project later ran into criticism for not checking the accuracy of government data more thoroughly.

Also, in 2007 the open data movement in the U.S. began in earnest, spawning other such efforts worldwide. The movement increased accessibility to government data internationally, although the need remained to have freedom of information laws to get data not released by the government.

By 2009, the increasing number of computer programmers and coders in journalism resulted in creation of Hacks/Hackers that would help more sharing between the two professions and ease some of the culture clash between the two groups.

Aron Pilhofer, then of The New York Times and now The Guardian, and Rich Gordon from Northwestern University’s Medill School of Journalism, pushed for creation of “a network of people interested in Web/digital application development and technology innovation supporting the mission and goals of journalism.” At the same time in Silicon Valley, Burt Herman brought journalists and technologists together. The three then joined to create “Hacks/Hackers.” The result has been an increasing technological sophistication within newsrooms that has increased the ability to scrape data from Web sites and make it more manageable, visual, and interactive.

Another outcome of the journalist-programmer mashup was the new respect for knowing how flawed databases are, and for ensuring the integrity of the data.

As was well-said by Marcos Vanetta, a Mozilla OpenNews fellow who worked at The Texas Tribune:“Bugs are not optional… In software we are used to make mistakes and correct them later. We can always fix that later and in the worst case, we have a backup. In news, you can’t make mistakes — there is a reputation to take care of. The editorial team is not as used to failure as developers are.”

More Breakthroughs

The years 2009, 2010, and 2011 also were breakthrough years for using data for journalism. In Canada in 2009, Fred Vallance-Jones and David McKie published “Computer-Assisted Reporting: A Comprehensive Primer” with a special emphasis on CAR in Canada. The European Journalism Centre began its data-driven journalism center that has organized workshops throughout Europe. Journalist Paul Bradshaw became recognized as a pioneer in data journalism in the United Kingdom. Wikileaks released its Afghan War Diaries, composed of secret documents and then the Iraq War Diaries, requiring journalists throughout the world to deal with enormous amounts of data in text.

This was followed in 2011 by The Guardian’s impressive series on the city racial riots and the first Data Harvest conference, which is organized by the Journalismfund.eu.

Also in the United Kingdom the Centre for Investigative Reporting (led by Gavin MacFadyen), which teamed in its early days with IRE to offer classes in data journalism during its summer school, has continued run a strong program on its own with the assistance of CAR veteran David Donald.

Meanwhile, at Wits University in South Africa, Anton Harber and Margaret Renn substantially increased the data sessions at the annual Power Reporting Conference, and data analysis has taken hold in Asia and Australia.

As of 2015, and after nearly 50 years of journalists using data, it is clear that data is not only a routine part of journalism, but also a driving force for stories. And the tools and methodology continue to expand.

The use of computers for journalism began by applying social science methods and statistical and data analysis to societal issues. It has widely expanded over the years into counting instances of incidents and accidents, to using spreadsheets and database managers, to matching apparently unrelated datasets, to mapping data geographically and in social networks, to web scraping, to more efficient data cleaning, to better crowd-sourcing and audience interaction, to multi-media and to text mining with algorithms.

There has been much discussion what to call the use of data for high quality journalism and various branding efforts. But whether it is called “precision journalism,” “computer-assisted reporting,” “data journalism,” ‘data-driven journalism,” or “computational journalism,” the good news is that it is here to stay.

Brant Houston is Knight Chair in Investigative Reporting at the University of Illinois and author of Computer-Assisted Reporting: A Practical Guide. Sections of this article first appeared in Computer-Assisted Reporting: A Practical Guide (and in a 1999 issue of Nieman Reports.)

This story originally appeared in the special magazine of the 9th Global Investigative Journalism Conference, published by SKUP, the Norwegian Association for a Critical and Investigative Press.

Global Investigative Journalism Network -

CAR