GIJN Toolbox: SpyOnWeb, VirusTotal, and SpiderFoot HX

Print More

Image: SpiderFoot HX

Welcome to the reboot of The GIJN Toolbox, in which we survey the latest tips and tools for investigative journalists. In this edition, we’ll dive into hands-on examples of how to use SpyOnWeb, DNSlytics, VirusTotal, and SpiderFoot HX to map out and analyze networks of websites while maintaining your privacy.

Searching for Hidden Connections between Sites

To start off, we’ll demonstrate how reporters can use Google Analytics tags — embedded in the source code of websites — as markers to help identify networks of websites. Mapping networks in this way can be useful to reporters because we can find previously unseen connections between organizations that appear to be disparate but are, in fact, connected.

Using GIJN’s website as an example to show how this works, let’s open the source code behind a web page to see if a Google Analytics tag exists. We’ll put this address into our URL bar in Chrome:

view-source:https://www.gijn.org/contact

This will generate the source code for that page:

We can then do a search on page for “UA-”. UA stands for universal analytics, and this tag is what Google Analytics uses to identify our site and any domains associated with our Google Analytics account. You can search in the source code for “Pub-” tags, which are connected to Google’s AdSense product, and for “G-” tags, the new GA4 format that Google rolled out for its Analytics product. It’s important to understand all the different tags that can appear in source code so that you can search for them and then run them through the tools we’ll discuss below to check for connections. For now, let’s do a search on page for “UA-”:

I found: UA-25037912-1

Let’s copy and paste this into a website called SpyOnWeb, which searches for “websites that probably belong to the same owner.” Here are the results:

Notice that our UA tag is only associated with the domain pointing to our website (i.e., gijn.org). Reporters can use similar methods of tracking the same UA tag (or Pub tag or G tag) to map out what sites might be connected if they share the same tag or tags. If we can discover that a particular site of interest is connected to other similar sites, we can then conduct further investigation to see if a funding source or a specific person or organization is behind them.

Let’s try another example. By using reverse search tools from DNSlytics, we can see what other IP addresses or Google Analytics tags are connected to our site of interest.

Let’s take the website of the right-wing media group Breitbart, which is based in the United States, as an example. We’ll grab the source code first by navigating our browser to:

view-source:https://www.breitbart.com/

We’ll search for “UA-” on the page. Sure enough, we find a UA tag in the source code:

Let’s take a look at the UA tag UA-715222-1 to see what DNSlytics comes up with.

Looks like DNSlytics found 19 domains with that UA tag. I can’t view all 19 because I don’t have a premium account with DNSlytics, so let’s see what SpyOnWeb comes up with.

We’ll put the same UA tag of UA-715222-1 into SpyOnWeb:

SpyOnWeb found seven domains, which is a good start. Let’s try putting the Pub tag into SpyOnWeb and see what we come up with. Here’s the Pub tag in the source code of the Breitbart website:

When I put the Pub tag of pub-9229289037503472 into SpyOnWeb, here’s what we find:

SpyOnWeb found 17 domains using the same Pub tag as breitbart.com. (DNSlytics found some 30 domains using the same Pub tag, but I wasn’t able to view them without a premium account.) Some of these domains overlap with the domains we found by reverse searching Breitbart’s UA tag. The ones that overlap are:

biggovernment.com

bigpeace.com

breitbartchildrenstrust.com

bigjournalism.com

breitbart.tv

breitbart.com

I want to research these domains further, but don’t want to put myself at risk by going to these unknown websites. First, I don’t know who is behind the website yet, and I don’t want to tip anyone off that I might be looking into them and their websites. Also, if I’m not using a VPN (virtual private network), the owner of a website could be able to see the geographical location associated with my IP (internet protocol) address, as well as other information I might want to keep private. (We suggest that you use a VPN when conducting certain online investigations, depending on your specific context. For more on VPNs, see the GIJN articles 4 Digital Security Tips Every Journalist Needs to Know and Essential Reading: A Cheat Sheet for Open Source Digital Security Options.) Second, I don’t know if malware or tracking software is installed on these websites.

One tool I can use to check the background of these sites and to establish if a website is safe to visit is VirusTotal, which looks for malware embedded into websites. You can enter any URL into this page and get results in seconds. An added benefit of VirusTotal is that you can see what domain a URL ultimately redirects to. For example, if you go to biggovernment.com, you won’t end up at biggovernment.com. I found that out by running the URL through VirusTotal, and this is what I found:

Looks like when you go to biggovernment.com, you’ll actually end up on the politics page of breitbart.com. Even though Breitbart does have a plethora of tracking software on its website — according to Blacklight, a web privacy tool from the nonprofit newsroom The Markup that looks for tracking software — it doesn’t appear to have any malware embedded within, according to VirusTotal’s analysis. As an added bonus, VirusTotal also aggregates all the tracking information we found earlier, like the UA and Pub tags:

What’s more, VirusTotal features a network graph function where you can see the relationships among domains, URLs, IP addresses, and more. Here’s the network graph that VirusTotal generated for one of the sites we came across earlier, bigpeace.com:

See the big orange “B”? Using this network graph function — again, without visiting the websites in question — I’m able to see that bigpeace.com actually is represented by the domain favicon (or icon) of breitbart.com. This is really useful for reporters who want to check the background of a website without ever going to it. Another added benefit of VirusTotal is that you can see historical IP address and Whois records, as well as all the outgoing links associated with a specific URL. Here are the outgoing links from http://www.bigpeace.com, which redirects to the national security page of breitbart.com:

Credit to BuzzFeed News’ Jane Lytvynenko and Craig Silverman for tipping us off to these concepts during their master class training session on digital investigations at this year’s IRE conference. Similar tools you can try are: DomainBigData, Whoisology, DomainTools, and BuiltWith.

Using SpiderFoot HX to Draw Network Graphs

Want to find all these connections automatically? Try version 6 of SpiderFoot HX, which came online in September. In a previous edition of The Toolbox, we wrote that reporters can use SpiderFoot HX, which has a tiered pricing structure and offers some elements for free, to draw connections between websites. Now, we’ll go into more depth by demonstrating a specific example.

Let’s take the example of the fake news site Now8News. It’s been well-documented that this is a fake news website. But is it connected to a network of other fake news sites? Let’s find out.

First, some background on Now8News. Snopes, the fact-checking group, featured the site in Snopes’ Field Guide to Fake News Sites and Hoax Purveyors.

Screenshot: Snopes.com

And here’s a screenshot from the Now8News entry on the website Media Bias/Fact Check:

Screenshot: mediabiasfactcheck.com

Now, let’s look to see if there is a network of sites connected to it. Click on the “Investigate” function at the top of SpiderFoot HX:

Name the investigation whatever name you want, then enter now8news.com in the search box, and click “Start Investigation.”

Once you do that, SpiderFoot HX will automatically generate a skeleton network graph, as seen below:

You have three nodes: an internal root node (with the picture of the spider on it), a domain name node, and an internet name node. Right click on the domain name node. Then hold your mouse over “Investigate…” followed by “Passive DNS,” and click on “Mnemonic PassiveDNS.” This will run the Mnemonic PassiveDNS module, which is one of many “modules,” or tools, that come built into the SpiderFoot HX software. This module passively collects DNS queries and will allow us to see the domains connected to our site of interest. The module could take several minutes to run depending on the amount of connections it finds, so allow some time for the data to process. Once it is done, move on to the next step.

Next, click on “Browse by …” and select “Data Type” then select “Co-Hosted Site.”

This will give you a list of all the co-hosted sites that the Mnemonic PassiveDNS module picked up based off the IP address of your investigation focus, which in this case was now8news.com. It’s going to be a long list, so you’ll want to narrow down your targets of interest by highlighting them. Click on the tick box to the left of any sites of interest and mark them by selecting the star button on the top right as seen here:

Then click on “Starred” to see all the sites you selected.

Once you have your list of marked items of interest, select the “Toggle View” button:

And select “Node graph.” This will automatically generate a network graph based on the co-hosted sites that you starred. Here’s the graph I got:

Notice that all of these nodes, like www.news4ktla.com and www.abc4la.com, are centered around the IP address for now8news.com, which is 67.227.229.104. This means they’re all hosted on the same IP address. We can confirm this by running now8news.com through DNSlytics’ reverse IP tool:

There are many more modules in SpiderFoot HX to try out. Have a module you’d like to see taken for a test run on a future edition of The Toolbox? Let us know. For now, take a look at the tutorial by the open source investigator who goes by the name NixIntel about using SpiderFoot HX to investigate a cryptocurrency scam in which a UK-registered company got people to hand over cash. NixIntel used SpiderFoot HX in the ways we described above — using domain names, IP addresses, and Google Analytics tags — to draw a network map of a cryptocurrency scam’s website. Visit SpiderFoot’s website and YouTube channel to learn more.

Recommended Links

Additional Reading

The Forensic Methods Reporters Are Using to Reveal Attacks by Security Forces

6 Tools and 6 Techniques Reporters Can Use to Unmask the Actors behind COVID-19 Disinformation

Investigating A Cyberwar


Brian Perlman is an assistant editor at GIJN. He specializes in human rights violations research using advanced digital forensics, data science, and open source techniques. He is a graduate of the UC Berkeley Graduate School of Journalism and a former manager at the Human Rights Center at Berkeley Law.

Leave a Reply

Your email address will not be published. Required fields are marked *