So you want to make your own network graphs and do your own network analysis, and you are looking for some software in which to do it? ANI has used a lot of them, and what follows is our review to help you make the right choice for your project.
Network analysis software fall into two categories – standalone GUI-based programs and command-line (CMD) software library packages to be used with other programming languages. GUI-based programs generally have a much easier learning curve and are more suited to manipulating graphs by hand. Software libraries are more difficult to learn but are generally more powerful and allow for greater customization and integration. Below are some of the major players that we have had the chance to use or review.
Pajek is what many would consider to be the pioneer in network analysis and visualization software – it has been in development since the late 1990s and comes with a rich set of analytical tools and produces some of the best-looking network graphs available. However, it is an older program that was designed mainly for academic research and smaller-scale projects, so it may not be the go-to software if you have large graphs of over 5000 nodes or so.
Pajek has two other major drawbacks. The ability to manipulate the map manually is fairly weak, and therefore the user must rely almost completely on the layout that the program calculates. For example, it is impossible to move more than one node at a time using Pajek, so large maps that need extensive manual tweaking to look right will take a long time. Pajek also requires a slightly arcane data format, the .net file, in order to graph a network properly. While other programs may be able to import multiple file types, Pajek is limited to this format, meaning that some manual manipulation of the data is needed before the user can even begin to use to software. Also, many visual parameters, such as the colors of the nodes, must be defined in the .net file since they can’t be easily changed within the GUI.
Still, Pajek has many features that make it perfectly adequate for many projects. It has a number of layout algorithms available, so the user can easily switch between multiple visualizations of the same social data. It is capable of exporting vector image maps, which can be edited or resized by the user after export. This means that the maps created with Pajek will never suffer from resolution problems as they can be resized indefinitely. Pajek includes a nice feature that allows the user to add a 3D effect to the nodes.
NodeXL is an open source template for Excel 2007/2010 that allows users to input a network edge list and easily output a customizable network graph. Within NodeXL, there are separate spreadsheets for edges, vertices, groups, group vertices, and overall metrics. Each of these categories has columns for color, shape, size, opacity, etc. These can be entered manually and also auto-filled based on user-specified preferences. Data can be dynamically filtered based on date ranges, labels, degree cutoff, etc. NodeXL generates a variety of graphs, both force-directed (Fruchterman-Reingold, Harel-Koren) and geometrical (i.e. circular graphs, etc.) Force-directed graphs are the most common method for plotting networks because they try to place nodes that are closely tied in social network space near each other on the physical space of the page. These methods also attempt to avoid overlapping of edges. Here is an example of the same network map drawn side-by-side in Pajek (K-K algorithm) and NodeXL (H-K algorithm).
Fig 1a (above): 3500-node map drawn in Pajek
Fig 1b (below): The same 3500-node map (from Fig 1a), drawn in NodeXL
The colors were changed for visibility reasons (Yellow in NodeXL was difficult to see), but the data are the same. Both graphs were created using the “Sphere” option for node shape. We think the Pajek map clearly wins the aesthetics competition, but NodeXL isn’t all bad. NodeXL it is a bit more user-friendly than Pajek and allows the user to customize node and edge attributes (color, size, shape) without having to re-load and re-map the network. This is one feature that is sorely lacking in Pajek. On the other hand, we have not found a way to save the map layout in NodeXL, which means that if you exit the spreadsheet you have to redraw the map and it won’t look the same. This is odd because the X-Y coordinate data is stored in the GraphML file, yet when you import the file it does not show up as such on the map.
Gephi is an open-source software that offers a stand-alone GUI and a Java software package. For most network maps that Activate Networks has to make, Gephi is the software of choice. There are a host of visual options that allow the user to customize the aesthetic appeal of the map. These visual options can be tied to data, such as a lighter or darker shade based on some characteristic of each individual node. There are also a large number of layout algorithms that can be used to arrange the nodes in the graph in fundamentally different ways, even on the fly through the GUI. Gephi supports data import in a number of formats including many common ones, and this makes it very easy to get up and running with a social network graph quickly.
If you move beyond the capabilities in its GUI, Gephi caters to the advanced user very well. A person familiar with Java can write programs and plugins for Gephi. This means that Gephi has an ever-expanding library of new layouts, statistical measures and visual effects, and they are all user-created. Gephi’s open-source structure also makes it possible to run large-scale processes such as automated mapping of thousands of maps. The community of developers and users is very active and continues to grow. We also like that Gephi can export animations of graphs that change and re-arrange themselves dynamically.
While we find Gephi’s shortcomings to be few and far between, there is definitely room for improvement. The statistical capabilities of the software are less robust then some others and some analysis may need to be done before importing the data into Gephi. Certain software packages such as iGraph (see below for details) give the user many more options for manipulating and analyzing the network data. It is also important to recognize that Gephi is still a beta-stage program, which means that there are several bugs that interrupt the expected operation of the program.
Fig 2: Network map drawn with Gephi and showing node characteristics with size and color.
Overall, Gephi is the most flexible and broadly useful piece of network visualization software. It is a freely available, open source program with an active community of users. Activate Networks has been using Gephi for nearly 2 years and has seen it improve continuously throughout that time. At least in terms of network visualization, we think Gephi would suit the needs of the most users.
Fig 3: Small community map drawn in NetMiner
The above graph, rendered in NetMiner shows only a small community because the trial version limits users to 100 nodes or less. We think this falls somewhere in between NodeXL (worst) and Pajek (Best) on the aesthetic spectrum. NetMiner includes other useful features like allowing users to use python libraries, create their own python plugins, and conduct batch processing. We haven’t tried out these options with the software. One other major upside to NetMiner is that it is the only GUI-based software for analysis that can handle very large (10,000+) networks with reasonable processing time. While Gephi has a GUI and can handle large networks, its analytics capabilities are limited.
iGraph and NetworkX are both open-source software packages run through a command line. Both have been around for a number of years and still are being updated. iGraph is written in C. It comes available as an R package, Python extension, and Ruby extension. NetworkX is Python-only. Theoretically, one could co-implement iGraph and NetworkX together in Python. Both come with extensive documentation and have active online support communities. Both programs are quite popular in the network sciences, and both can output most useful formats including SVG. One of the benefits of using a software library like iGraph or NetworkX is they allow for network analyses to be combined pretty seamlessly with non-network statistical analysis being performed on the same data set with the Python, R, or Ruby languages.
In terms of visualization, NetworkX does not have its own drawing package. It uses Python’s matplotlib and GraphVis (http://www.graphviz.org/) to create graphs. Our impression is that NetworkX’s strengths lie in its customizability and network analysis features rather than its drawing capabilities, so it may not be an all-in-one solution if that’s what you are desiring. Here is an example of a graph produced in NetworkX:
Fig 4: NetworkX drawing
Here is an informative report describing the limits of visualization in NetworkX: http://cs.unm.edu/~bedwards/data/gsoc1.pdf. NetworkX is probably not the method of choice for great visualizations, but it has robust network statistical capabilities. An additional upside is that NetworkX is a well-coded Python package with lots of documentation. Some users report that NetworkX runs slower when dealing with larger networks, especially when running algorithms more complex than N^2, which essentially are algorithms for which the computational complexity increases faster than the square of the number of nodes. We didn’t experience this, however, in our particular tests.
iGraph has its own set of built in drawing functions which appeared to us to be more robust and customizable than those found in NetworkX. It allows for features such as overlaying graphs with transparency. Below is an example:
Fig 5: iGraph drawing
Similar to NetworkX, iGraph supplies a lot of network analytics capabilities, and can be integrated well into other code written in Python, R, or Ruby. The network statistics in iGraph focus on classical approaches to networks of finding components, cliques, communities, and egocentric network metrics.
If you are working in R, another package that deserves particular mention is the ‘sna’ package. This package allows for much of the same functionality as does iGraph; in fact within R, sna and iGraph share some of the same code for generating network map images. We’ve found when iGraph and sna have comparable but differently coded functions that the iGraph routines are typically a bit faster, especially on large graphs. What comes in handy about sna, however, is it has many routines for maximum likelihood and permutation tests that are run at a whole-network level. These types of routines are much more limited in iGraph. For example, in sna you can regress networks against one another easily or perform network autocorrelation analysis of a trait distributed on a network. Those statistical procedures are not included in iGraph, and they are pretty involved to write de novo. One caveat, tasks like network regression can become very computationally intensive with large sample sizes, and we have found sna’s routines for these tasks start bogging down north of a couple thousand data points. Some of this computational slowdown could be avoided by coding your own procedures in a lower level language like C rather than relying on the R code in the sna package.