For GIJN’s My Favorite Tools series, we spoke with Indian journalist and information designer Gurman Bhatia. Formerly a data and graphics journalist at Reuters, Bhatia has taught data storytelling in the Lede Program at Columbia Journalism School in New York, and is one of nine international experts to sit on the pre-jury for the SIGMA data journalism awards.
Her own work has been recognized by the Malofiej International Infographic Awards, the Global Editors Network (GEN) data journalism awards, and the Society for News Design. A self-taught coder, Bhatia’s initial acquaintance with data visualization was during her master’s degree at Columbia University, where she took a module on interactive design and storytelling. “I learned the basics of HTML, CSS, and JavaScript — our class was from six until nine, and I remember both my professors staying until after 10 because I wanted to learn to scrape,” says Bhatia. “They were also gracious enough to answer my out-of-syllabus questions. Had they not been, I don’t know if I would have continued on this path.”
After working at Poynter as a Google Journalism Fellow, Bhatia moved to West Palm Beach, which is also in the southern US state of Florida, where she worked on the investigative team at the Palm Beach Post. “Some of the best investigative reporters in the state worked there,” Bhatia says. “My editor asked me what I wanted to achieve at the end of six months and I said I wanted to walk away with one investigative story.”
Bhatia wanted to develop a story from idea to publication. “I filed 36 public record requests to understand how different cities dealt with halfway houses [for drug and alcohol users]. We were able to build this non-existent database, which allowed us to know ownership patterns, who the kingpin of the halfway houses was, and who had shady practices.”
Bhatia adds: “It was very old school investigative reporting — we relied heavily on Document Cloud for that investigation because of the number of public record requests we were filing.”
After a stint at the Hindustan Times in Delhi, Bhatia joined the award-winning team at Reuters Graphics in 2018, where she reported and worked on visualizations for topics as diverse as election fraud in India, use of force during protests in Hong Kong, and bushfires in Australia. Now freelance, she works for an environmental think tank and runs training sessions for GIJN member International Center for Journalists (ICFJ), a Washington, DC-headquartered nonprofit, and the East-West Center. These are her favorite tools:
GitHub Actions
GitHub Actions are performed on a GitHub repository that contains all the files and revisions for a project, and allows these processes to be automated without a developer having to manually trigger these actions.
Reuters’ comprehensive Coronavirus tracker — which was designed and developed by Bhatia and two colleagues — features a spinning globe where users can zoom in on specific countries to identify the vaccination rate. It also allows readers to compare countries. The site generates thousands of pages, and was designed with GitHub Actions. “This project was the most amount of code I’ve ever written in my life, and GitHub Actions was very good for organizing, tracking, and managing such a huge project,” says Bhatia.
Scraping in node.js
Node.js lets users run the JavaScript programming language outside of a browser, such as a server or desktop. The open-source system allows users to perform operations that would otherwise require other languages, like Python, to execute. This means that a user doesn’t have to switch between different coding languages.
“I think scraping is great because it lets you collect data that is otherwise trapped in webpages or parsed data trapped in PDFs,” says Bhatia, who does most of her coding in JavaScript. “I’m a huge proponent of using computers for what they are best at, and using humans for making editorial judgments.”
Bhatia used scraping for a story she did on the Indian elections in 2019 to visualize its sheer scale: an election contested by over 8,000 candidates. “One thing which is very common in Indian elections is that certain parties will pay two or three people with the same name to contest the same seat so a voter is confused — they’re essentially dummy candidates,” says Bhatia. “At the time, the Indian government said they were going to put the faces of candidates on the ballot. My immediate reaction was that it means more data. Now I have faces as a data point.”
Using node.js, Bhatia scraped data from India’s election commission such as the images of candidates as well as data from myneta.info, which included information on candidates who had criminal cases against them. “The trickiest thing was to match the two databases, so pairing criminal cases and the faces of candidates,” she explains.
d3.js
D3.js is a powerful JavaScript library for creating custom interactive charts in a web browser. It gives users complete control to customize visualizations to create bespoke data-driven graphics [using HTML, CSS and SVG].
One of Bhatia’s first d3 visualizations was at the Hindustan Times for the 2016 Olympics. She plotted every country’s national record in the 100m sprint and created an interactive chart. “At the time it was a visual that blew everyone’s mind,” she says. “It was inspired by a New York Times piece.”
Referring to Reuters’ Coronavirus tracker, she adds: “The rotating globe on the COVID tracker was made in d3 and Canvas. We wanted to create a narrative around COVID-19 instead of having this dashboard of a huge data dump.”
FFmpeg and ImageMagick
FFmpeg is an open-source tool with different libraries for handling multimedia files — for example, it can be used to convert an audiovisual file’s format. ImageMagick is a suite of command-line tools used to manipulate digital images, such as animating, converting, and analyzing images. Bhatia worked on a story with the team at Reuters looking at pollution in the Indian capital. “We set up a camera on the rooftop of our bureau in Delhi, which took a photo every hour for 35-40 days,” she recalls. “I would get periodic data dumps of photos, and I used ImageMagick and FFmpeg to stitch the photos together as a video. I also created a d3 chart with the same frame rate as the video, so they would line up.”
Google Sheets
“I use Google Sheets quite a lot, especially for collaborating,” says Bhatia. “The Reuters COVID tracker runs entirely from a massive Google Sheet, with reporters inputting numbers every day.”
For Bhatia, a major advantage of Google Sheets is the ability to create datasets. “At Reuters, we created a database for a story we did on the Hong Kong protests, where we collected data on what days there had been violence and what kind of tactics were used by each side. That data didn’t exist, so we had to collect it through multiple sources,” she says, adding that individuals would input the data on a single spreadsheet.
Notion
Notion is a note-taking application that allows users to manage and plan projects as well as create writing repositories and linked databases. “I use it to create databases,” says Bhatia. “One thing I need when teaching is concrete examples, so I create different databases in Notion and use a Chrome plugin to add links to a database I need.”
She adds: “One book that I recommend, which changed my dimension in terms of note-taking, is “How to Take Smart Notes” by Sönke Ahrens. It brings out the philosophy that notes should be interlinked, because if I write something and never read it again those notes are not worth taking.”
University Data Hubs
The Trivedi Centre for Political Data at Ashoka University produces open access political data in India, from datasets on political parties that have contested elections to the composition of governments at the union and state level. “A lot of the accessible political data including historical election data is because of them,” says Bhatia.
Bhatia has also created her own resource of data sources from India. “The issue is always — especially in a regime such as the current one — availability of data can be questionable, but at the same time, there are a lot of things for which data is available, such as political and education data,” she says. “I recently led some training for Indian journalists and one of the things I did was to create a dataset of datasets just for India, which could be useful for journalists here.”
Bhatia says an absence of reliable data during the COVID-19 pandemic also saw a resurgence of data collection by reporters in the field.
“Reporters were sent to the banks of the Ganges [River] to count how many bodies were there; in one day they found more than 2,000,” she says. “That’s something to praise — putting yourself out there and collecting data when none is available.”
Additional Resources
GIJN on YouTube: Investigating with Data by Giannina Segnini
My Favorite Tools: Alberto Cairo on Data Visualization
How the COVID-19 Pandemic Has Shaped Data Journalism
Hanna Duggal is a freelance data journalist based in London. She has reported on issues such as policing, surveillance, and protests using data.