One of the common advantages cited by startup incubators is the connectedness they offer to people working in them, or by just “hanging out” in their space.
In the early-2000s, I was working for a company that was a tenant in the newly built Toronto uber-incubator space called “MaRS.” I recall the CEO of MaRS describing its role using a billiard ball analogy where there will be more collisions when like-minded people are placed close together.
This description is alluding to a network. A network is a set of interconnected entities that can communicate or interact with each other. These entities could be companies and institutions, but most often are people, which makes it a social network. Each entity is a “node” and the connection between two nodes is a “link.”
An example is in Figure 1. The node called Alice is very active in the network. Alices are in advantaged positions in networks and are very good connectors to find as we seek to access new people. However, in this example, she is only directly connected to within one degree of others in her “clique” (this term is a basic concept in graph theory; defining and identifying them are important). She has to go through Raphael to access other “cliques.” In this example, the node Raphael has importance, because of the “cliques” he connects.
In a connected world, we are all connected to each other in some way. The maximum number of degrees of separation to reach any node turns out to be quite small: around 6. To my knowledge, no number higher than 7 has ever been found. The exact number depends on the network. This number has been trending downwards with more constructed social systems. The lowest number found recently is 4.57 on Facebook.
However, just because the degree of separation is small does not mean we can all access the important person we need. Geographical and social differences are still relevant. This is where a node like Raphael is important. This node is defined as having high “betweenness centrality” in a network.
The importance of betweenness is illustrated in Figure 2. Individuals and entire industries have created businesses out of being brokerages between cliques and networks. Social and economic disruption can happen by dis-intermediating around these traditional bridges.
In a startup incubator the idea is that connections can happen between other nodes without having to go through Raphael or through some broker. An incubator can also be described as a social network where the degree of connectivity is brought lower, implying connections are easier to make. The incubator model in this context aims to use social dis-intermediation to accelerate entrepreneurship.
How to analyze and visualize networks
Networks are hard to describe because they are mathematical constructs. Their analysis is based on graph theory and network theory.
In day-to-day practice, when we are networking or trying to find collaborators or partners, we are navigating these networks mentally and intuitively.
With the growth of social media over the last 15 years, the idea of a network is more appreciated as a mainstream concept. Now, it is easy to map these nodes and links by analyzing databases. We are only beginning to grasp how these can be used. The results can give new insights to create new business strategies or new economic development strategies.
As a start, it is highly useful to visualize a network. This allows concepts to be communicated to an audience that are not experts in data science or network analysis. How is this done?
The first step is to gather the data which can be put in a Microsoft Excel sheet or other database. The next step is the analysis.
Many software packages are available to perform the network analysis and to present the results visually. Spotfire and Tableau are two giants in the business intelligence market that have tools for this purpose.
A simple one to use is NodeXL. Launched in 2008, it is an open-source add-in for Microsoft Excel to do social network analysis and content analysis. The Basic version is free. The Pro version is a fee-based full-feature version that includes access to social media network data importers, advanced network metrics, and automation.
Here are some examples of what is possible, excerpted from professor Ben Shneiderman’s 2013 computer science class at the University of Maryland.
Student Joshua Brule obtained filmography data from Wikipedia for actors from the television series Firefly to show intersections in their film and television collaborations over the course of their career. (Figure 3) He found that while they had few collaborations prior to working on Firefly, their collaborations increased after they worked together.
This next example introduces how network analysis gives insights on research and innovation networks. Student Ruofei Du tracked the connections between institutions that participated in a technology conference between 1988 to 2013. (Figure 4) The data on participating coauthors of 1033 scientific papers were obtained from DBLP (a computer science bibliography website). Their institution affiliations were obtained from Google Scholar.
We see two powerhouses active in this technology field: Carnegie Mellon University and Microsoft. We also see the connections of collaborations between the institutions.
Note the islands that are not connected: Queen’s University, Brown University and University of Bristol. Depending on your perspective, these institutions need to improve their strength in this area, or these may not be your first pick if you were seeking new talent in this area.
Figure 5 zooms in on the most prolific contributors within the network. The size of the node is proportional to the total publications. The width of the link corresponds to the number of co-authored publications between two authors.
The biggest nodes are “Scott E Hudson” and “Ravin Balakrishnan”, but they do not collaborate together. The strongest collaboration is between “Tovi Grossman” and “George W Fitzmaurice.” Why does this occur?
Figure 6 are the affiliation labels on the same graph. The clique on the left are professors from Carnegie Mellon, Columbia University and MIT. The group on the right component include two researchers from Microsoft and two from Autodesk. This suggests that Grossman and Fitzmaurice have more collaborations with the industrial field, and Balakrishnan links to Grossman and Fitzmaurice via Microsoft Research.
His analysis of this research network and his findings are a wonderful read, which can be found here.
These examples make clear that getting access to suitable data is important. This requires creativity to find the right sources and a lot of grunt work to collect and curate the data.
Du’s write-up also demonstrates the curiosity and data mining involved in making sense of the data and how new insights are developed.
Network mapping of innovation hubs
The previous example analyzed people connections as co-authors on scientific publications. Co-inventors on patents are also analyzed frequently. In an earlier post, Vanguard Group’s research team identified emerging technology fields and I was asked how did they do this. It was very much this form of network analysis of publications and patents.
Scott Dempwolf takes this approach into the economic development realm by modeling innovation based on patent ties, federal and state funding, and physical locations in his graduate research. His 2012 Ph.D. thesis dissertation is entitled Network Models of Regional Innovation Clusters and their Impact on Economic Growth.
Here is an example of his work. He sought to identify technology and talent clusters in the state of Pennsylvania which could be positively influenced by economic development policy.
The raw data is shown in the image below.
It is hard to identify any patterns among the large number of nodes. This is where the network analysis software uses algorithms to identify clusters of nodes that link to each other more frequently than outside the cluster.
He discovered over 80 clusters around specific Pennsylvania counties and local enterprises. (Figure 8) One cluster is the Pittsburgh metro area anchored by Westinghouse. These types of clusters are easy to self-identify from their anchor companies, but this visualization reveals direct connections of this cluster to other counties and sub-industries that are not ordinarily apparent.
Another illustration (Figure 9) shows the specific links between the pharmaceutical and medical cluster, composed of several companies, universities, and a major government department. It also shows how strong or weak those links are. An interesting arrangement of inventors in several connected fans show which of these institutions are more inventive.
Dempwolf is now applying this method of analysis for economic development departments across the U.S. The state of Illinois recently produced a science and technology roadmap based on analyses he did to reveal the status of the different clusters in the state: alloys and metals, polymers, batteries and energy storage, biofuels, biomedical, and nanotechnology. The final report with some illustrative maps is here: link, link. An example of the polymers cluster map is below.
I worked in economic development a decade ago. Back then, asset maps were popular. These were compilations of tables and reports of institutions, companies and people. It was impossible to identify the relative importance of these various “assets” and their strategic context among the overwhelming barrage of information in that form.
These network graphs provide an easy-to-understand visual summary of these “assets” as well as quantitative displays of the strength of their links and their economic impact.