An Exploration of Power-Law Networks
Part II: Exploration Details

Owen Densmore
Sun Microsystems Laboratories
owen.densmore@sun.com

Overall Strategy: Two Power-Law Network Types

Networks observed in peer systems and the web exhibit a power-law distribution of edges per node.  This means that a plot of the number of edges (connections to the network) per node (host in the peer network) is approximated by an equation of the form Edges = Constant*Nodes^-gamma.  A sample program building such a power law produces the plot on the right; click for full size.

gamma = -2.1, bins = 100, K = 1
1:7819 2:1824 3:778 4:425 5:266 6:182 7:131 8:99 9:77 10:62 11:51 12:42 13:36 14:31 15:27 16:23 17:20 18:18 19:16 20:14 21:13 22:12 23:11 24:10 25:9 26:8 27:8 28:7 29:7 30:6 31:6 32:5 33:5 34:5 35:4 36:4 37:4 38:4 39:4 40:3 41:3 42:3 43:3 44:3 45:3 46:3 47:2 48:2 49:2 50:2 51:2 52:2 53:2 54:2 55:2 56:2 57:2 58:2 59:1 60:1 61:1 62:1 63:1 64:1 65:1 66:1 67:1 68:1 69:1 70:1 71:1 72:1 73:1 74:1 75:1 76:1 77:1 78:1 79:1 80:1 81:1 82:1 83:1 84:1 85:1 86:1 87:1 88:1 89:1 90:1 91:1 92:1 93:1 94:1 95:1 96:1 97:1 98:1 99:1 100:0
nodes = 12159, edges = 34014
 

This network has 12,159 nodes and 34,014 edges.  It is characterized by a "gamma" or power-law exponent of -2.1.  It is dominated by nodes with just 1 edge/connection: 7,819 of them. It decreases quickly: the 10th entry shows only 62 nodes with 10 connections, and decreases to 1 node w/ 59 connections, but continues with a "fat tail", out to 1 node w/ 99 connections.

Network Types: Power List and Generated

The exploration uses two general construction techniques to build these power-law networks. A small set (6) of Java classes were created to build these networks, and a large number of datasets were created for study; 1890 general sets and an additional 1000 sets specific to Small Worlds networks to check our results with existing published data.  Each dataset is distinguished by three or more parameters.  Each includes "N"; the number of nodes, and "K", the number of edges per node.  Additional parameters are unique to each type of network.  For example, the third parameter for Small Worlds nets is the rewiring probability.

Nomenclature

Thus a natural nomenclature evolved for naming the classes of graphs: This gives a total of 11 graph "classes": Bag RdE+ RdE- RdN+ RdN- SmW+ SmW- TdL+ TdL- TdS+ TdS-

The complete name of a dataset includes the third parameter.  For all power list graphs, the third parameter is "gamma", the power-law exponent.  For Bag graphs it is N0, the initially empty set of nodes.  And for SmW graphs, it is P, the rewiring probability.  Here are examples

Graphs

For each style of network, a set of image pairs were generated for N=25, 50 and 100.  The image pairs are 1) the entire graph, along with 2) the same graph with only "parent", root edges drawn.  Some examples are given here; the "family" will be explained below under "Indices".
Click on the image for full size.  Click on the label for a family of related graphs.

graphs: Bag25-2-2

graphs:RdE-50-2-1.8

graphs: TdS+100-2-1.8

graphs: SmW+50-4-0.1

roots: Bag25-2-2

roots: RdE-50-2-1.8

roots: TdS+100-2-1.8

roots: SmW+50-4-0.1
The graph images are color-coded as follows: The ordering of the graphs around the root, or most connected, node provides interesting upper bound information about the underlying network.  For example, if the maximum "level" (distance from root) is N, then the largest distance between any two nodes must be 2*N or less .. it is the distance between the two nodes traversing through the root.

The "dot/neato" graph drawing program from Bell Labs was used to construct the images.

Plots

Three plots were created for each dataset: Two sets of these were made for each of the graph types, one for N=1000, one for N=2000.  Examples for TdS-2000-3-2.2 are shown:
Histogram, Level, and Search plots for TdS-2000-3-2.2; click for full size

hist: TdS-2000-3-2.2

level TdS-2000-3-2.2

search: TdS-2000-3-2.2

Indices

To help navigate the data, several "indices" were made.

First, several datasets are combined into a "family" differing only in the number of nodes in each dataset.  The node values are 25, 50, 100, 250, 500, 1000 and 2000, thus 7 datasets are combined per family page.  There are 270 such families/pages.
 
Each page has three full graph images and the three corresponding root/parent-link graph images as above.

These are followed by two sets of 3 plots, for N=1000 and 2000.  The plots are the histogram, level and search plots discussed above.

The page concludes with a digest of the data for the 7 datasets. The digest includes:

  • Edges: The number of edges for the graph
  • Level Data: The Max, Avg and Median distance of the nodes from the root node.
  • Cluster Data: The Cluster Coefficient, Characteristic Path and Diameter of the graph.
  • Sample Search Data: The values for the three searches (Power Law, Depth 1st, Breadth 1st) along with the shortest path; averaged over 100 samples.
  • Sample Levels: The Level Max, Avg, Median averaged over 100 samples.  The samples are 100 random points (or N if N<100) chosen as root for breadth first traversals and used for searches above and gathering levels data.

Click for full size
The digest for the sample page is reproduced here:

Name Edges L:Mx/Av/Md ClC/CPath/D S:Pl/D1/B1/S P:Mx/Av/Md
TdS-25-3-2.2 60 3/1.96/2 0.78/2.76/4 1.76/4.32/3.04/2.52 3.56/2.5/2.76
TdS-50-3-2.2 135 9/4.51/5 0.75/4.72/10 7.54/18.92/13.06/4.28 8.26/4.58/4.72
TdS-100-3-2.2 291 11/4.24/3 0.77/5.24/15 15.65/41.09/34.42/5.88 11.55/5.64/5.24
TdS-250-3-2.2 814 16/6.51/6 0.8/8.31/24 43.74/121.03/88.8/8.52 18.74/8.9/8.31
TdS-500-3-2.2 1745 20/7.84/7 0.79/10.81/37 93.91/245.18/210.57/12.2 26.84/11.47/10.81
TdS-1000-3-2.2 3697 24/10.91/9 0.81/14.62/43 237.84/612.4/436.28/15.93 31.67/15.54/14.62
TdS-2000-3-2.2 7823 52/20.3/19 0.87/24.75/75 260.64/945.04/877.96/25.06 58.24/26.08/24.75

Class Summaries

There are 5 summaries of the data given by each of the 11 classes of graph (Bag RdE+ RdE- RdN+ RdN- SmW+ SmW- TdL+ TdL- TdS+ TdS-).  These are In addition, each of the three plot summaries are given for both N=1000 and 2000.  There are therefore a total of 88 summary pages.  These are a very convenient view onto a large set of data.  The pages can be large, however, so may take time to load.  Here are examples for the SmW- class:
Class Summaries for SmW-; click for full size

Class/Family Index

Finally, there is a single overall Index page. 

It is organized by Class, and within Class, by family.  Each Class has a section with one line pointing to the various Class summaries, and several lines containing pointers to 5 families each.

Below is the Bag section of the Index page, to the right is a link to the Index itself.
 

Graphs indexed by Class and Families w/in Classes

Bag [Class summaries: graphsroots histogram 1000/2000 levels1000/2000search1000/2000]
BagX-1-1 BagX-1-2 BagX-2-2 BagX-2-4 BagX-3-3
BagX-3-6 BagX-4-4 BagX-4-8 BagX-8-16 BagX-8-8


Click for Index Page

Data

The 1890 datasets are contained in the data/ directory.  Each data file is named using the nomenclature discussed above, followed by ".dat".  Thus the data for the SmW-50-2-0.2 dataset  is stored as data/SmW-50-2-0.2.dat.  In addition, the histogram gnuplot fit log is kept in the hist/log/ directory, i.e. data/fit/TdL+1000-3-2.2.fit.  All data for graphs and plots are taken from these files and are created by a variety of Unix shell scripts using gnuplot and neato for graphics.  The source files are listed in Files.html.

Two summary files are created, digest.txt and summary.txt.  The digest file has the four summary lines from each of the data files, while the summary file has a one line condensed version of this, and is the data presented in tabular form on the family pages.
Digest file entry for RdE+25-3-2.2
RdE+25-3-2.2: Random-Edges-Complete N=25 K=3 G=-2.2
Level Totals: Max/Avg/Med=2/1.54/2; Nodes=25, Edges=62
Averages[25]: PL/DFst/BFst/Min=1.32/1.36/1.32/1.88; Path-Max/Avg/Med=3.12/2.06/2.0
Small World Averages[25]: cCoef/charPath/Diam=0.26/2.0/4
Summary file entry for RdE+25-3-2.2
RdE+25-3-2.2 E=62 L=2/1.54/2 C=0.26/2.0/4 S=1.32/1.36/1.32/1.88 P=3.12/2.06/2.0

These summary files are the basis of intra-and inter- family studies, and studies over all data.

As an example of an intra-family study, the following plots the Power-Law search values for the Bag series, both before separation by family and after:

Family Plot: Bag Series Power-Law Search; click for full size

Before separation

Family Plot
These "family plots" break a class of network, Bag in this case, into its various "families", which vary in N, keeping other parameters constant.  Thus in the Power-Law search above, the family plot on the right shows the 10 varieties of Bag datasets.

An example of inter-family plot is the "class plot" which collects the 11 families together in a single plot.  Continuing with the Power-Law search example, the following plots the Power-Law data for all families, both for all datasets, and for the 2000 node datasets only:

Class Plot: Power-Law Search; click for full size

All datasets
2000 node datasets

As an example of an overall study, here is the search data for the 270 2000-node graphs.  The data contains the search data over 100 samples of all three searches (power-law, depth first, breadth first) for each graph, sorted.  Thus the total data represented is 270x100x3=81,000 searches as 3 vertical entries over 270 X axis values.  The X axis is simply the rank order within the sorted dataset.  The Y axis is the number of searches required for the sample node to find a random node, averaged over the samples within the graph dataset (The "S=" entry in the Summary line above).  It emphasizes the superiority of the power-law style of search.

2000-Node Graph Search Data; click for full size

linear scale

log-log scale

An additional 1000 datasets were created for Small World's graphs.  This was done by obtaining 500 random number's between 0 & 0.50 (0-50%) and using these for rewiring probabilities for 300 node, K=6, SmW+ and SmW- graphs.  Both Digest and Summary files, as described above, were obtained for these datasets.

The SmW datasets confirm the relationship between Clustering Coefficient and Characteristic Path as the rewiring probability increases: The path decreases faster than the clustering.  Similarly, the interplay between the + & - rewiring strategies (adding edges, detaching edges) is as expected with added edges decreasing clustering less rapidly than detached, and decreasing path length more rapidly than detached.  The path length is much less dependent on strategy, however, showing only slight difference in path length change with probability.

Small Worlds Clustering vs. PathLength; Click for full size

linear scale

log-X scale

Discussion

Now that we've seen the details of the study, lets go on to an overall discussion of the data.