Using NodeXL for Social Networking Investigations

Friday, March 4th, 2011

nodexllogoMapping social network users is nothing particularly new. Social scientists use it to compare people’s networks online and offline, and thanks to tools like Loco Citato’s MySpace, Facebook and YouTube Visualizers, investigators have a valuable tool for finding criminals and their associates.

Complementing Loco Citato’s excellent tools is an open-source application called NodeXL, which maps Twitter, Flickr and YouTube users. A book about it from Elsevier, “Analyzing Social Media Networks with NodeXL: Insights from a Connected World,” talks about the tool’s social-science value. But whether law enforcement or corporate investigators are using NodeXL is unknown. (If you use NodeXL or have heard of other investigators using it, please let me know.)

Perhaps the most striking fact about NodeXL is that Microsoft made the tool. Licensed under the Microsoft Public License (Ms-PL), NodeXL is available on the open source download site CodePlex.

NodeXL stands for Network Overview, Discovery and Exploration for Excel – yes, that is correct, Excel, which is the engine that runs the graphing. NodeXL is a template for Excel 2007, although it also works in Windows 7.

Crunching large datasets for social maps

Most of the information that appears to be available online so far about NodeXL regards its ability to easily graph data input into the spreadsheet. As social researchers put together relationships between users, the graphing ability allows the researchers to sift through large amounts of data from a social networking site and find associations that might have been missed.

For the few social networks it collects data from, it is quick and very powerful. Flickr, Twitter and Youtube are the only ones programed directly into the template at this time. Some blogs, including Marc Smith’s (one of the authors of a book on NodeXL), mention that Facebook is in the works for inclusion with NodeXL. Hopefully other social media sites will be added as this tool matures.

To test what NodeXL can do with a Twitter account, I used my own, @Webcase. (Please note: you do not have to be logged into an account to use NodeXL.)

Very quickly NodeXL collected a list of the Twitter users being followed by “@webcase”. For visual fun, Excel also makes a graph of the followers (it takes a few settings to get the pictures into the graph—but once you know how, which took me a little research to figure out, it is pretty easy).

Of interest is the number of followers each user has, how many they are following, the number of tweets they have posted, their time zone, when they joined Twitter and the link to their Twitter page.

Pulling information about videos posted on Youtube is one of NodeXL’s excellent features. Let’s say you have an investigation where a particular term or name is used. You can enter that name in to the Youtube video selection and get a list of videos, with the link to those videos, in a usable spreadsheet. Flickr searches are similar: you can search for image tags as well as Flickr users.

The real power of NodeXL, and the reason (besides its price tag) it is so popular among researchers and academics it, is its ability to graph associations. If, for instance, you select a Twitter user to download and choose options to obtain data on both followers and following along with any tweets that mention the user, you can collect a lot of data that can then be used to show associations. Associations for investigators = leads, witnesses or possibly even suspects.

By using the dynamic filters within NodeXL, you can limit the graph’s view to fewer contacts by increasing the requirement for the number of contacts (tweets, retweets) the associations have.

Another plus about NodeXL: it has an active community working on this open source tool, and updates come out regularly.

For more information

A great primer on analyzing social media networks with NodeXL, “Analyzing Social Media Networks: Learning by Doing with NodeXL,” is available from the University of Maryland. (The posted copy on the UMD website says “Draft” and “Please do not distribute”. What? Do they know what the Internet does in Maryland?). Despite that, it is a good guide to some of NodeXL’s more esoteric graphing uses. For our purposes I’ll cover some of the quicker applications from an investigative standpoint.

