Kansas State University


IT News

The NodeXL Series: Extracting a User Network from Flickr (Part 11)

In content networks like Flickr, there may also be social network extractions of user accounts.  This will capture the interrelationships of various individuals that have accounts on the site.  This will show linkages between the various accounts in terms of cross-references.  For this blog entry, we will use NodeXL Excel Template to extract the “USDAgov” user network on Flickr, which is a peer content sharing site including photos and videos.  The “USDAgov” is likely a fairly healthy-sized network, even in a content site, because of its government mandate.

Setting the Parameters for the Flickr User Network Crawl

To start this data extraction, start the NodeXL Excel Template.  Click on the NodeXL tab.  Click on the “Import” dropdown menu, and select “Import from Flickr User’s Network.”  Put “USDAgov” in the screen name area.  Then put in the private Flickr API key.

In terms of vertexes, this crawl will include both the contact of the user and any persons who commented on the user’s photos.  The “Both” radio button is the selection.  This network will only be a 1.5 degree crawl.  This type of crawl collects the 1-degree information (the ego neighborhood) and also the connections behind the alters (the vertices or nodes) in that ego neighborhood.  (The 1.5 degree connections are known as “transitivity.”)

This crawl will include user information in the vertices worksheet.  There will not be any people limit to the crawl.  Click “OK.”

The crawl progress is shown at the bottom of the “Import from Flickr User’s Network” window.  Note that the photo information collected is represented in numbers, which enables a speedier capture of the data (but loses some potential textual descriptor information in the process).

When the data extraction is complete, accept the data into the workbook.  Save it using an informative naming protocol. In this case, the file was named USDAGovUserNetworkonFlickr1.5DegUnlimited.  The file name will make it easier to find in the future.  The detailed listing of files in MS Windows enables viewing of the date of the most recent file save.

The Downloaded Data and Post-Processing

The populated workbook looks like the following screenshot.

Extract the graph metrics per the prior directions.  The table for this extraction looks like the following screenshot.

This crawl of the USDAGov user network in Flickr found 207 vertices (understood as 207 accounts in network with each other around shared interests and following and followership).  There were 201 unique edges found, but over 20,195 edges with duplicates (given the crawl for the full range of any connections).  The maximum geodesic distance (diameter of the network) was 2, which means that the largest distance between any two nodes included an intermediary node in between them.  The average geodesic distance was 1.97 (which rounds up to 2), which means that the average size of the degree of separation between nodes is also virtually similar to the diameter of this user network on Flickr.  This makes sense since this was a 1.5 degree crawl.

Finally, extract the groups (clusters) from the network per the earlier directions.  Four clusters were identified.

The metrics provide a basic view of this network. Now, let’s make a few visualizations to see what those might suggest about this user’s network on the Flickr content sharing site.

Some Graph Visualizations

A Harel-Koren Fast Multiscale Layout Algorithm view of this data shows the following graph, with USDAGov focal node in the middle.

This depiction maps the vertices as dots of different colors, and the edges are shown as lines.  This one reveals only some interconnectivity, which suggests that duplicate edges were not depicted.

A Fruchterman-Reingold depiction looks very similar.

A ring lattice graph (circle) looks like the following.  The online version may be accessed here.  The interactive version may be accessed here.

Finally, a horizontal sine wave version of this same data looks like the following screen shot.

Save the .xlsx file.

Final Note:  NodeXL is a free and open-source tool that is available from Microsoft’s CodePlex site (which is a space for project hosting for open-source software), and it is sponsored by the Social Media Research Foundation.