This post was originally featured on http://blog.sociomantic.com, published on April 20th, 2010. Since the website will be relaunched and the post removed, I have relocated the tutorial to my personal page so that the SNA community can continue to benefit from it.
If you’ve never heard of the operatic power-metal band Nightwish, I’d suggest you start below. Even if metal’s not your thing (can’t really say it’s mine), it’s worth watching just to know it’s out there…
I was first introduced to Nightwish in college, but I hadn’t thought about them in several years. I was reminded of this outrageous group when, a few afternoons ago, our managing director Tom sent me a link on Skype. Not such an unusual occurrence, no doubt, but this link was extra-awesome. Within a couple of minutes, I realized I wasn’t the only one who’d received the URL, because suddenly the air was full of exclamations.
For the next quarter hour, everyone in the office was poking around this interactive graph (see static representation below), which maps the relationships of similarity between the artists in the Last.fm Audioscrobbler API. (Last.fm is a website that allows you to log, or “scrobble,” all the music files that you play on your computer or mp3 player so you can see statistics about the listening habits of you and your friends.) The graph, created by Dr. Tamás Nepusz (a postdoctoral research fellow for the Royal Holloway University of London), demonstrates the network of relationships between musical artists (or groups) based on the artist “similarity” algorithms from Last.fm. The different colors represent the various musical genres, the size of the circles is proportional to the number of listeners for the artist’s “top track” on Last.fm, and the little lines in between show the strength of the similarity between two artists. And how do our pals Nightwish fit into all this? Well, that’s the band that Nepusz started his data collection from, and once reminded of their unique musical offering, I just couldn’t pass up the opportunity to share.
I’ll let you read the details of his project over here, but here I want to use this awesome network map to help explain one of the research projects that my colleagues have been involved in. (Of course, I understand if you need a minute to plug your favorite bands or your Last.fm username into Nepusz’s graph.)
The Name Game
First thing first, let’s use this Last.fm graph to help explain some of the network science terms that that we use when talking about these sorts of visualizations (in our case, a social graph).
- Node or Vertex
A node is a connection point in graph. In the Last.fm visualization, each artist/group is a node represented by a colored circle (with the artist name inside). In the social graph, each node is a different person in the network.
The edges, represented as lines between the circles, show the actual connections between nodes – in this case between similar artists. Here, the darkness of the line indicates thebetweenness of the connection, which basically describes the potential of the edge in terms of information flow between different groups. (An artist with high “betweenness” might be one with high similarity to artists in both “rock” and “hip hop” genres – someone with a lot of what the music industry calls “crossover potential.”) In a social graph visualization based on the web (like sociomantic’s), the edges are the links between people — so if my blog is listed on your blogroll, that’s an edge connecting us.
Like a social graph, this Last.fm graph visualizes the relationships between nodes that carry various weights. Here, the “weight” (size of the circle) is based on the number of listeners for an artist’s top track on Last.fm, but in a social graph, the weight might be based on measures like centrality (number of connections), betweenness (how much the node connects separate clusters), or influence (determined by a combination of many measures).
Connecting the Dots
I wanted to share this visualization because playing around with it can help someone who might be new to network graphs to get a grip on what I’m talking about. And while I hope you had as much fun with it as we did, what I really want to share is some info about a research project my colleagues have been working on with the guys over at the ARC Centre of Excellence for Creative Industries and Innovation at Australia’s Queensland University of Technology.
So what exactly are they studying?
Axel Bruns, one of the QUT researchers, briefly explains:
Building on our joint research into the Australian and French political blogospheres, we’ve embarked on a large-scale, three year research project to investigate the processes of online public communication in Australia and France as they unfold across major social media spaces including Twitter, YouTube, Flickr, and the wider blogosphere.
This particular project is only one part of their ongoing study of the flow of information between blogs and media websites that began in 2007. (Those interested can find the full research collection here).
Naturally, data collection is at the root of this research. Back in 2007, the research team compiled a list of known political blogs (at this time, only for Australia – they wouldn’t begin keeping tabs on the French blogs until 2009.) Using web-crawling technology, the list of blogs was expanded based on the outbound links from these known blogs, then expanded again based on the new list — and so on, until there ceased to be new links or the links surfacing were irrelevant to the political blogosphere. The links used could be from the official blogroll, from links within individual blog posts, or from links listed in the comments.
After the initial blog list was compiled, they continuously crawled each and every new post in order to scrape for new linkages, so that by gathering links over the course of time they could begin to understand the blog network – which blogs were linking to which others, which blogs had the highest number of inbound links, which ”clusters” of blogs could be identified by their interlinkage, etc. The researchers will continue to scrape these blogs for further links until the end of the three year research period. They also gathered topical data from the blogs so they could get an understanding of how different events drove the flow of information within the network.
There have been many stages of analysis over the course of this research, but below you can see just a few examples of how these researchers are making sense of all the data gathered. In the following slideshow, researcher Tim Highfield offers us a nice overview his initial findings.
(Scroll down to see Axel Brun’s October 2009 presentation about the findings in the Australian blogosphere.)
Tim outlined some of his recent observations over on his blog. The images below are just a few of the network representations that Highfield has created by plugging the data gathered into visualization program called Gephi.
Although the rainbow-colored visualization I showed before the slideshow might be the nicest to look at, it also shows how imprecise mapping can leave us with little more than what Nepusz labeled as an “ugly hairball.” For the two visualizations above, Tim groomed the data a bit by doing things like eliminating nodes with less than two incoming links and coloring blogs with known political affiliations.
Like Neupsz’s last.fm visualization, Tim’s pictures show us how, with careful application and analysis, network graphs can be powerful learning tools for understanding the way information and influence moves within a network. From a business perspective, hopefully now it’s a little more clear why it might be important to use a customer network graph to better understand the “big circles” in your web of clients and prospects.
What do you think the researchers will find in this data? Do you think it’s most likely that the information flows from the political blogs to the official news sites, or vice versa?