My Favorite Tools with Cuban Data Journalist Barbara Maseda

Print More

Русский | Español | 中文

Proyecto Inventario visualized the types of accounts followed by the Cuban president’s official Twitter account. Photo: ScreenshotFor GIJN’s My Favorite Tools series, this week we spoke with Barbara Maseda, the founder and editor of Proyecto Inventario, an open data platform for journalists reporting on her native Cuba.

Poor internet access and lack of transparency in the country make it very difficult — and sometimes impossible — for journalists to find even basic data for their reporting. Maseda, who studied journalism at the University of Havana, has been researching quantitative approaches to the news for several years, including at Birmingham City University in the United Kingdom.

Barbara Maseda founded Proyecto Inventario in 2018. Photo: Courtesy of Factual

She founded Proyecto Inventario in 2018, during her year at Stanford University as a John S. Knight Journalism Fellow. The project aims to provide independent journalists with easy access to data and documents about all aspects of life in Cuba.

Since the coronavirus outbreak began, Proyecto Inventario has become an essential source of data on the spread of the virus in Cuba. They have converted information about reported cases into structured data, and disaggregated it by date, geography, and patient characteristics. Proyecto Inventario’s visualizations, using the Flourish platform, have been utilized by several independent Cuban media organizations.

Proyecto Inventario is visualizing the data on the spread of the coronavirus in Cuba. Photo: Screenshot

Maseda, who was recently been chosen as a TED2020 Fellow, runs Proyecto Inventario herself, from social media to answering requests from journalists, and plans to continue building the initiative through crowdfunding and grants.

Here are some of Maseda’s favorite investigative tools, and how she uses them to overcome the difficulties of data reporting in and about Cuba.

Klaxon

“Of the many options out there to automate the detection of changes in websites, in Proyecto Inventario we use Klaxon, a tool created by The Marshall Project. Klaxon is very convenient, because we monitor many websites, and many different parts of different web pages. So, the level of customization and detail that Klaxon offers, allowing us to focus on very specific elements, works very well for us.

“This is very useful in journalism in general, but particularly in Cuba, where most official websites don’t offer a subscription service for people interested in learning if there’s new information or data available. Also, independent journalism is not legal in Cuba, so our relationship as reporters with media liaisons in official institutions is virtually non-existent. Therefore, automating the detection of website changes is certainly one of our best shots at learning, as fast as possible, that new official information has been posted.

“For instance, one of the websites that we monitor is the official repository of Cuban law, the Official Gazette. Thanks to Klaxon, we are notified within the hour that a new regulation has been published, and we use that information to alert journalists and editors who we know are working on stories or cover a beat for which the new regulation might be relevant.

“Another way in which we have used Klaxon is as an imperfect, but effective, substitute for a scraper that needs to run periodically in the cloud. If you don’t have the skills or the time to set up your own scraper, you can use Klaxon to collect and store data tables, or lists, or any other source of data from a given website as they are updated.”

Klaxon enables you to monitor changes to the names of officials on a government website in order to keep track of variations in the structure of the administration. Photo: Screenshot

Sublime Text 

“I love how Sublime Text can be used to create a sort of local search engine on your computer. If you have hundreds, or thousands, of documents stored locally as text files, you can use this powerful text editor to find matches that are going to help you focus on the files that are more relevant to your investigation. You can read Friedrich Lindenberg’s wonderful tutorial, A Poor Journalist’s Text Mining Toolkit, if you want to learn how to do this.

“As a journalist from a country where there’s not a lot of information online, we rely a lot on document dumps that change hands in flash drives — so this is a very useful tool. I would say it is useful even in cases when the information is online, but in formats that are not optimal for users and search engines.

“The Cuban law repository, to use the same example again, for a long time published new regulations as PDF files that were compressed as .RAR files. In cases like this, a good solution is to download the entire website, convert everything to more convenient file formats, and organize the information in ways that make it easier for journalists to search.”

Sublime Text allows you to search thousands of documents, for example for legislation that includes the keyword “telecommunications.” Photo: Screenshot

Wayback Machine

“Everybody loves the Wayback Machine, and of course so do we in a country where public record keeping is very deficient. Web pages and even entire websites become unavailable all the time in Cuba, so having a resource like the Wayback Machine to check older versions is extremely valuable.

“One of the problems that we face when it comes to data integrity is that sometimes some government institutions delete old records and/or entries when they release new versions of a given data set. For instance, the most recent version of the registry of non-agricultural co-ops doesn’t include any of the companies that have been eliminated from the record. We used the Wayback Machine to get all the previous versions of that registry and include all the inactive or extinct co-ops in a data set that is available to reporters, and anyone else interested in consulting it.

“The Wayback Machine can also be a great neutral third party to make copies of websites that you fear might disappear, or be altered, and that are key to your investigation. We built a data set of all the flights that Cuban doctors took from Brazil back to Cuba in late 2018, following cancellation of the Mais Médicos Program. We made sure that the news reports from which we took the figures of the medical personnel traveling in each flight were stored in the Wayback Machine. This is an important measure to take when you are a journalist who covers a government that likes to discredit critical coverage and question journalists’ commitment to truth.”

Video Street View

Use Google My Maps to organize the videos as you find them. Photo: Screenshot

“This is not a tool strictly speaking, but it’s an idea that could be useful to reporters in countries where there is no Google Street View, like Cuba. Faced with the impossibility of using Google Street View as a geolocation tool for OSINT like people do in other countries, we have found an alternative in the hours and hours of footage of Cuban streets and neighborhoods that people post on social media platforms like YouTube, Facebook, Twitter, and others. Typically, these are videos recorded by Cubans living abroad who go visit their families and like to capture these nostalgic unedited scenes of their old neighborhood, or the route that they’d take to work.”

DocumentCloud

“A big part of the work that we do at Proyecto Inventario consists of structuring information contained in documents, so the fact that DocumentCloud puts in one place features that allow us to manage many key document processing steps, from OCR to entity extraction, makes our work easier.

“But in addition to all the features that I’m sure most reporters appreciate from DocumentCloud, I also like that it gives me access to documents shared by other users where sometimes I can find information about my country. Searching these public documents is a great way to find details about Cuba that were probably very secondary, or even not relevant at all, for the investigation conducted by the newsroom or reporter sharing the documents, but that can, of course, be very important for me.”

Mentions of Cuba’s telecoms company (ETECSA) as search results in DocumentCloud. Photo: Screenshot

Kumu.io

“Visualizing social media connections, company structures, and creating family trees, are some of the common tasks that Kumu.io makes very easy on a daily basis. We do a lot of social media analysis, and it’s convenient that Kumu.io’s network maps can be easily embedded and updated simply by updating the public Google Sheet document containing the data behind the map, if that’s the type of data source that you select from the available options.”

This story has been updated. In a previous version, we mistakenly described Proyecto Inventario as the winner of the 2019 Data Journalism Awards in the introduction.


Kristina Puga is a journalist based in New York. She writes for NBCNews.com, focusing on the Latino community in the US. She also started a site called WiserWithAge.com where she writes about inspirational people age 60 and up in order to pass on their wisdom to younger generations.

Leave a Reply

Your email address will not be published. Required fields are marked *