An Exploration of Power-Law Networks
Part III: Discussion

Owen Densmore
Sun Microsystems Laboratories
owen.densmore@sun.com

Network Family Characteristics

To study the data, we use three styles of plot: A Class plot, a Family plot, and a Summary plot.

The Class plot is a composite plot including each of the 11 major network classes.  We do this in two forms: over all sizes of the networks, or limited to the large, 2000 node networks.  The Class plot shows a single property, shown in rank order, i.e. sorted by the property value.  The Edge plots just below are Class plots, showing edges in rank order.  Because each class has a different number of nodes, we plot by rank percent.  This spreads the points so that each property covers the whole X axis.

A single line of the class plot can be broken out into a Family plot which focuses on all the families of a single network class, Bag for example.  It plots a separate line, by node count, for each set of parameters for that class.  See the Bag Family below.

We also use customized "Summary plots" which sum all the network types together, for a global, study wide view.

Edges

Each style of network can produce a different number of edges, based on the technique for creating the network and the parameters used in the network creation.  The initial parameter sets were chosen to yield similar edge counts between the various network types.  To show this, we examine the edge Class plots:

Edge Class Plot Linear and Log-Y; Click for Full Size

This shows all the edges for each of the 11 classes of networks, sorted by number of edges and graphed by rank order.  Because this is for all node counts, we are mixing in a very wide variety of networks.  To correct for this we do the same plots, but limiting ourselves to the 2000 node members only:

Edge Class/2000 Plot; Linear and Log-Y. Click for Full Size

The edge counts remain reasonably tightly clustered, with divergence occurring after 50%.  This is predictable: the Bag and SmW- classes are entirely deterministic.  Given their N, K and family parameter (N0 and P [the rewiring probability]), one can exactly calculate the number of edges.  The SmW+ edge count is estimated by adding P*nodes edges to the SmW- case due to that style's adding re-wired edges rather than detaching them.  Finally the remaining classes simply increase based on K (the minimal degree) and gamma (the sharpness of the power-law descent).

Power-Law Search

Searching is our primary interest in investigating power-law graphs, so we next look at the search results.

First, a very broad look at the three search types (Power-Law, Depth First and Breadth First) using a summary plot.  The following plots the search data for the 270 2000-node graphs.  The data contains the search data of 100 samples of the three search types for each dataset, sorted by search length.  Thus the total data represented is 270x100=27,000 instances of the 3 searches as 3 vertical 270 X axis values.  The X axis is simply the rank order within the sorted dataset.  The Y axis is the number of searches required for the sample node to find a random node, averaged over the samples within the graph dataset, with three points at each X value representing the three search types..

Three Search Types.  Summary Graph, Linear and Log-Y.  Click for larger view.
As predicted by the Search in Power-Law Networks paper, we see the Power-Law search performing well, dominating the other two.

The same data is plotted again, normalized by the shortest distance between the two nodes of the search.  This gives a better feel for the optimality of the searches.  Roughly 25% of the Power-Law searches are within 2X the shortest path length, 50% within 4X, and 70% within 10X.  Note that a small number of searches are better than the shortest distance!  This is possible due to the radius of  the search being greater than one.  Thus a node two away from the searching node would be discovered with a search number of one, a normalized value of 0.5.

Three Search Types, Summary Graph, normalized by Shortest Path, Linear and Log-Y.  Click for larger view.

Next, some details on the Power-Law searches, arranged by network class (a Class plot), using the 2000 node datasets.

Power-Law Search.  Class Graph, Linear and Log-Y.  Click for larger view.
Here we note that two of the classes perform quite badly over the entire class: the TdL+/- pair.  This is not too surprising due to their forming long braids (TdL-) optionally with the head/tail connected (TdL+):
TdL+/TdL- Examples: Click image for larger image, text for family page.

TdL+50-3-1.8

TdL-50-3-1.8
One other class, the SmW- (detached edge rewiring) networks, perform badly over the last quartile.  This is due to the tendency of these graphs to remain "stringy" over the first few rewiring probabilities, and to the "corner case" of 0.0 probability included in the SmW sets for illustration purposes:
SmW- Examples: Click image for larger image, text for family page.

SmW-50-8-0.0

SmW-50-8-0.05

SmW-50-8-0.1
On the other extreme, the three very good performers are Bag, RdE+, RdE-; with RdN+/- very close behind.  These graphs types are all characterized by being "bushy", observable in their graphs having little observable structure, and there level charts being quite short and tall.  Compare the Td-levels (very poor), SmW-levels (poor last quartile) with the RdE-levels (good). Only the "corner case" of K=1 for RdE- has large level values while all the Td- levels are high.  The SmW- levels span a wider set of values.

Next we normalize the Power-Law Class plot by Shortest Path, as in the Three Search plot above.  With the classes separated out, we see slightly different results than before.  Now, rather than 25% of the searches within 2X the shortest path, we see a wider divergence: three of the classes (Bag, RdE+/-) stay within 2X out to nearly 60% while others fair much poorer.  We also see an anomaly introduced by the normalization: Bag goes from excellent performance to poor performance in the last datapoints.

Power-Law Search; normalized by Shortest Path, Linear and Log-Y.  Click for larger view.
Bag goes from excellent performance to poor performance in the last datapoints.  This is understood by looking at the Shortest Path values, and observing that Bag has very low values for the Shortest Path (it is the most "bushy" of the network classes, with very shallow level plots).  This causes Bag normalized searches to have relatively higher normalized search values due to the small size of the divisor.
Shortest Path, Linear and Log-Y.  Click for larger view.

Just as the Class plots increased detail over the summary plots, we can further break-out the classes of networks into their families, based on parameters such as K, the node degree, G, the Power-List degree, P0, the Rewiring probability for Small Worlds families.  The Bag family of Power-Law, both with and without normalization, is shown here:

Bag Family Plot, Power-Law Search Log-Log, standard and Normalized.  Click for full size.

 
The normalized plot reinforces the earlier finding that the Bag class has a large number of networks with searches within 2X Shortest Path. 

The two networks with smallest K (K=1) diverge from the cluster.  On the right we show the 50 node Bag networks with K=1, and K=2 (both with initial nodes N0=2).  The K=1 network is less "bushy" than the K=2 version, leading to larger search values seen in the two graphs.

The additional, 1000-dataset SmW tests (500 SmW+, 500 SmW-, N=300, K=6) shows greater detail.  The Power-Law search data shows:

Small Worlds Detailed Study Power-Law Search Linear, standard and normalized.  Click for full size.
Small Worlds graphs have their Characteristic Path Length decrease quickly with increasing Rewiring Probability.  Searching follows the same trend, with a quick drop in search length.  Although the search length is over 10%(30) of the node count (300) for P<5%, it quickly goes to 20 or less at P=15%.

When observed normalized against Shortest Path, the same trend, although less dramatic, appears.  Note that the mid-point (P0=0.25; 50% rank) is above 2X the Shortest Path, thus Small Worlds networks have somewhat worse search values, as expected from their greater clustering and structure, and as explored above.

As expected, the search length is smaller for the SmW+ networks due to the increase in edges and the corresponding decrease in path length.

Small Worlds Metrics

The Watts/Strogatz Small Worlds paper pointed out the interesting interplay between structure, as measured by the Clustering Coefficient, and searchability, as measured by the Characteristic Path.  We next consider their characteristics in our datasets.

The more detailed Small World dataset (1000 datasets: 500 SmW+, 500 SmW-, N=300, K=6, Prob random between 0-50%)  exhibits the Characteristic Path/Clustering Coefficient behavior introduced by Watts and Strogatz: both decrease with Rewiring Probability, with the Clustering Coefficient descending more slowly and more linearly.  Compare these with the plots from the earlier Watts and Strogatz findings.

Small Worlds Clustering vs. Path; Click for full size
The interplay between the + & - rewiring strategies (adding edges, detaching edges) shows the added edges strategy results in clustering decreasing less rapidly than the detached strategy.  This is due to leaving the highly clustered edge in place, thus decreasing the overall clustering less.

The added edge strategy also decreases path length more rapidly than detached simply by adding more edges, thus a chance for a shorter path.  The path length is much less dependent on strategy, however, showing only slight difference in path length with probability.

It is tempting to look for the same correlation among the other network types.  This is difficult for two reasons.

The first is simply that the main study datasets are much broader than the detailed SmW dataset above, thus does not provide a large number of similar networks with varying CC/CP.  The customized SmW study, on the other hand, keeps the two primary parameters, N and K, constant, while collecting 1000 datasets with varying Probability, which keeps the size (edges) of the networks relatively close .. and indeed exactly the same for the SmW- set.
 

To see this, we extract data for the above detailed study from the main dataset.  On the right, the first plot shows N=2000, K=6, SmW+/-, (P=0, .05, .1, .2, .3, .5), while the second eliminates the P=0 extreme case.  This considerably reduces the classic CP vs CC form above, although we do see its vague outlines.

The second reason is simply that the third parameter in the other cases (Gamma for the 8 power lists, N0 for Bag) does not provide a large number of similarly sized graphs with closely varying CP/CC.  In neither case did we attempt to go from a structured network to an unstructured one by varying a parameter.  Thus the datasets do not provide an analogous CP/CC study.

That said, we can look at the Small Worlds parameters on their own.  We first look at Class plots, below, for CC and CP.  Note we reverse the sort order to descending to match the CP/CC studies above.

Looking at all our networks, regardless of size, we see considerable variation in Clustering.   SmW+/- occupy the center, with all the Td networks (TdS+/-, TdL+/-) being high and the Bag and Rd networks (RdE+/-, RdN+/-) being low.  This agrees with the idea that Clustering indicates how structured graphs are.  We also see that the networks cover most of the possible values between 0 and one.  We note that SmW+/- remain fairly linear, as noted in the detailed study.  The unstructured group (Rd/Bag) suite drop off quickly, showing a faster loss of regularity, while the Td networks remain linear over a large span, similar to SmW.  [Node: This suggests that, should we devise further detailed studies to find CP/CC behavior similar to SmW, we should first consider the Td families.]

The Characteristic Path plot also shows three tendencies: The TdL+/- classes have high values, Bag quite low values, and the rest fairly tightly clustered together in between.

Class Plot, Clustering Coefficient and Characteristic Path; Click for full size

Looking at just the large 2000 node networks we see greater detail.

The Clustering plot below left shows the structured networks (Td, SmW) retaining the same overall shape as above left, while the less structured networks (Rd, Bag) change significantly, all starting at 0.1 or below.  This requires the (Td, SmW) 2000 node entries to be uniformly distributed among the rest, while the 2000 node entries for (Rd, Bag) are at the end of their groups.  (Note: TdL+/- are identical; found by inspecting the data files!)

The Path plots (above and below, right) maintain the three groups: Td high, Bag low, the rest clustered in between.  Overall, the values are higher in the 2000 node plots, indicating a growth in path with node count.  The growth is low for Bag, high for the Td+/- pair, and moderate for the rest.  The 2000 node plot begins to show Sm+/- being a 4th group.

Class 2000 Plot, Clustering Coefficient and Characteristic Path; Click for full size


 
The CC behavior between the 2000 node plot and the complete plot (Td, SmW similar, Rd, Bag different) is an instance of a property of Clustering: the CC is independent of node count for highly structured topologies.  This is independent of the CC value itself.  In Watts' Small Worlds book, pg. 102, the example of extreme cliques was groups of isolated complete subgraphs.  A complete graph (all possible edges drawn) has maximum clustering (CC=1.0) because all neighbors are connected to each other.  Increasing the size of the graph (number of isolated complete subgraphs) does not change this clustering. 

A far simpler example is for lattice graphs like in the Watts/Strogatz example.  We can see that increasing the number of nodes of a uniform lattice structure retains a constant CC value.

Therefore the highly structured network types tend to grow in size with CC nearly independent of the size.  We can illustrate this with family plots and family pages which vary the node count keeping other parameters fixed.  As an extreme case, consider the SmW*8-0.0 series which vary the node count but keep the edge degree at 8 and rewiring probability zero.  These form a perfect lattice.  Notice that the CC's remain constant for all node counts in the plot below, even though the CCs vary from 0 (K=2, a minimal ring graph) to just under 0.7 for the K=10 case.  Click the K=8 network on the right to go to the family page.  We reproduce the data digest area here to show the CC=0.64 for all the node counts.

Small Worlds Lattice Networks; P=0.0.  Click Plot for full size, Graph for Family Page

SmW+100-8-0.0
Data Digest for N=25, 50, 100, 250, 1000, 2000
Name Edges L:Mx/Av/Md ClC/CPath/D S:Pl/D1/B1/S P:Mx/Av/Md
SmW+25-8-0.0 100 3/2.0/2 0.64/2.0/3 1.48/1.76/2.44/1.88 3.0/2.0/2.0
SmW+50-8-0.0 200 7/3.57/4 0.64/4.0/7 3.8/8.06/8.5/2.92 7.0/3.57/4.0
SmW+100-8-0.0 400 13/6.7/7 0.64/7.0/13 9.68/32.15/37.12/6.91 13.0/6.7/7.0
SmW+250-8-0.0 1000 32/16.06/16 0.64/16.0/32 28.3/92.36/108.22/15.89 32.0/16.06/16.0
SmW+500-8-0.0 2000 63/31.69/32 0.64/32.0/63 52.42/226.97/229.57/31.08 63.0/31.69/32.0
SmW+1000-8-0.0 4000 125/62.94/63 0.64/63.0/125 122.74/448.52/485.94/63.15 125.0/62.94/63.0
SmW+2000-8-0.0 8000 250/125.44/125 0.64/125.0/250 240.18/945.17/936.34/119.43 250.0/125.44/125.0

From the extreme case of SmW lattice graphs we now look at two tendencies: highly structured networks retain their CC over change in node count, while the highly unstructured graphs show a decrease in CC with increasing node count.

For the structured case, we look at the family plots for SmW- and TdS+ which exhibited the same plot structure between the plots of all classes and the 2000 node restricted set above. The family plots below validate the observation that the CC should be fairly insensitive to node count.  We see the plots are fairly level.  (Note the SmW flat lines which are the extreme lattice cases shown above.)

Family Plot, Clustering Coefficient, SmW- and TdS+; Click for full size

For the unstructured case, we look at RdE+ and Bag, both of which do not maintain plot structure between the two class plots.  As expected, they show a marked change in CC with increased node count.

Family Plot, Clustering Coefficient, RdE+ and Bag; Click for full size

Turning now to the Characteristic Path, we explore the tendency noted above: The Path plots maintain three groupings between the complete class plot and the 2000 node class plot: TdL+/- high, Bag low, the rest clustered in between.  Overall, the values are higher in the 2000 node plots, indicating a growth in path with node count.  The growth is low for Bag, high for the TdL+/- pair, and moderate for the rest. We'll again use Family Plots to illustrate these, using TdL-, RdE+ and Bag.  (The thumbnails below are slightly clipped to fit three across)

Family Plots, Characteristic Path, TdL- RdE+ Bag. Click for full size.

The TdL- family plot (left above) shows strong linear growth in CP with node count.  Note the Y axis goes to 600, a large percentage of the largest network (2000).  The middle plot, for RdE+, shows a few networks with strong growth (although with a Y axis max of 250 rather than TdL-'s 600), but the majority being very much smaller, clustered lying on the X axis.  A logy plot shows that the CP for the majority of these RdE+ networks falls below 10.  Finally, the Bag plot shows very small (less than 5 for all K>1 networks) CP values.  A logx plot suggests CP ~ log(N) for Bag.

Finally, we explore the SmW+/- pair, due to their slight separation from the main middle group.  The family plot on the right below appears similar to both TdL- and RdE+ above: strong linear growth, but with several networks clustered tightly to the X axis.  In the middle plot below, we extract the Rewiring Probability=0.3 set to see a mainstream set.  This appears to have strongly Log(N) structure, as supported by the LogX plot to the right.  This log relationship is found in the Small Worlds book for some variants studied there; see pg. 56 for example.

Family Plot, Characteristic Path SmW+, 0.3 Subgroup Linear, and LogX. Click for full size.


 
 

Summary

The final page of the exploration summarizes the study.