The Internet is not a scale-free network

Danny Calegari linked me in to a recent article by Willinger, Anderson, and Doyle, arguing that the Internet is not a scale-free network; that is, one in which the distribution of node link degrees follows a power law, with a small number of extremely highly-connected hub nodes, and the great majority of nodes having very few connections. Scale-free networks are very fashionable at the moment.

The article is a pretty scathing attack upon the original work by Faloutsos, Faloutsos, and Faloutsos which asserted that the router topology of the Internet is a scale-free network, with its concomittant claim that it is acutely vulnerable to malicious attacks on the small number of crucial hub nodes. Willinger et al. point out that the traceroute utility used to sample the network topology is unable to recognize either one router with multiple interfaces (such as an aggregator connecting low-speed clients to the high-speed backbone), or a virtual IP node built upon multiple physical devices and networks (such as is used in the Internet backbone). So in fact, the high-degree nodes are not hubs, but the connectors feeding from ISPs into the internet backbone, whereas the true hub nodes are relatively low-degree but high-throughput.

More broadly, Willinger et al. attack the practice of picking simplistic models that take no account either of the underlying nature of the phenomenon being modelled or the unreliability of the data being analysed, and then engaging in an exercise of parameter-fiddling data fitting to make the model fit the data. With sufficient parameterization, exercises in data-fitting can be made to work, but lead to non-informative results. The authors argue that instead, modelling should be based on domain knowledge (at least for this domain). A model of the Internet topology should recognize the intentional nature of the system design, and the language of the model should be that of constrained optimization, not data fitting.

My personal research contribution is an investigation into why on earth scale-free network are called "scale-free". The name suggests that you can zoom in or out to any scale, without changing the appearance of the network, and indeed theWikipedia article derives the name from the supposedly self-similar properties of the network. But a scale-free network is far from self-similar: the hubs are massively connected, while the outer nodes have very few edges. In fact, the term comes from nothing so sophisticated, as its inventor Barabasi explains:

When we began to map the Web, we expected the nodes to follow a bell-shaped distribution, as do people's heights. Instead we discovered certain nodes that defied explanation, almost as if we had stumbled on a significant number of people who were 100 feet tall, thus prompting us to coin the term "scale-free".

Even in academic writing, we have to beware of false etymologies.

Postscript: on reflection, I suspect that the simplifying derivation of "scale-free" provided by Barabasi above is due to its popular-science context (he was writing for the Scientific American). In the original paper, the term "scale-free" is used without definition; presumably it pre-exists. An article by Eyal Sivan on "Scale-Free Thinking" connects scale-free with self-similar, and described the power-law distribution as scale-free. Anyone want to clarify matters?

Update: please see the comment by Eyal Sivan, who clarifies why "scale-free networks" are "scale-free", and what self-similarity means in this context. I've slightly modified the wording of my link to Eyal's original article to try to improve its anchortext clarity.

2 Responses to “The Internet is not a scale-free network”

  1. Eyal Sivan says:

    You write: "A scale-free network is far from self-similar: the hubs are massively connected, while the outer nodes have very few edges."

    Scale-free networks are scale-free because the same distribution of relationships exist at any scale (forming a power law). This does not mean that the nodes are self-similar, just that the distribution is self- similar. Zoom in on a hub or on an outer node, and you would find the same relative distribution of winners and losers at any given scale.

    However, I think the jury is still out as to whether the Internet is truly a scale-free network; or more to the point, whether Barabasi's model is an accurate model of the Internet. One of his most significant omissions, for example, is not including aging of links. Great to see the debate is still going strong.

    Thanks for linking to me, and for the other great links too.

  2. admin says:

    Ah, thanks for the clarification; I'll alert readers to it. And thanks for your original article; it deserves a higher Google rank for the "scale-free" query!

Leave a Reply