Saturday, January 24, 2009

Biology Department Collaboration Graph

At a recent scientific staff meeting, I made a short presentation about the way we collaborate with each other in the Biology Department at WHOI. There seemed to be some interest in having some of the figures posted, and this blog seems as good a place as any. So, here they are.

My presentation focussed on the scientific staff collaboration graph. The image above is a picture of the collaboration graph that I constructed using data from curricula vitae submitted in October 2008.  Each of the 29 dots in the picture represents one person, either a member of the scientific staff or senior research specialist. If two scientists have coauthored a peer-reviewed publication, their dots are connected by a line. I'll call two scientist who are connected in this way "collaborators." (Of course the forms of collaboration are multifarious, and many significant interactions between scientists are obscured when collaboration is equated with coauthorship, but I had to start somewhere!)  In mathematical graph theory, the dots are called "vertices" and the lines are called "edges."  

The circular representation of the graph above masks some of its structure. It also makes it impossible to tell who is who! So here is another representation, with the vertices labelled:

Click on the image to enlarge it; you still might need a magnifying glass.

A collaboration graph like this is a kind of "social network." Social scientists specializing in organizational behavior have been studying how the structure of an organization's social network affects both its effectiveness as a whole and the social capital---roughly speaking, the benefits that accrue as a result of holding a particular location in the graph---of its members. (Burt [2000] has written an interesting review paper on the subject.)  One purpose of my presentation was to see if we could use some of the ideas in social network theory to help us think about the kinds of people we want to hire into the department.   

Another purpose was to offer a snapshot of the state of collaboration in the department. To that end, I showed some basic properties and descriptive statistics of the Biology Department collaboration graph. Some of these I found surprising:
  • There are 43 edges in the graph.
  • The graph is connected; i.e., you can find a path from any vertex to any other vertex by traversing edges of the graph. Put another way, you can connect any two scientists in the graph by an unbroken chain of coauthored publications.
  • The diameter of the graph---the number of edges that separate the two scientists who are furthest apart (McDowell and Tyack)---is 11.
  • Most of the scientists (20) have 3 or fewer collaborators in the department.  One scientist (Weibe) has 7 collaborators.
  • One can weight the edges of the graph by the number papers upon which the two collaborators appear as coauthors. Of the 43 edges, a large majority (31) have weight less than or equal to 3. The strongest collaborations are Olson-Sosik (13 papers), Caswell-Neubert (14 papers), Davis-Gallager and Moore-Stegeman (21 papers) and Hahn-Stegeman (27 papers).
  • Finally, one can break the graph into "communities," or groups of vertices such that the number of links within the group is higher than the number between the groups. (The detection of communities in large graphs is a mathematical challenge.  Santo Fortunato has given a fascinating lecture on the subject. Ok, maybe it's just fascinating to me.) The weighted version of our collaboration graph turns out to have statistically significant community structure. Here's a picture with the nodes color-coded by community:



Make what you will of the graphs and statistics.  Here's what I think: the state of collaboration in the department is pretty darn good, but could be better. In particular, it would be really interesting to see what might come from collaboration between remotely connected parts of the graph.  As John Stuart Mill put it in his Principles of Political Economy (1891):

It is hardly possible to overrate the value … of placing human beings in contact with persons dissimilar to themselves, and with modes of thought and action unlike those with which they are familiar.  Such communication has always been, and is peculiarly in the present age, one of the primary sources of progress.

1 comment:

  1. One of the interesting things about this little project is how imposing a mathematical structure on this little system not only makes it clearer, but makes it possible to see things you might not have thought of. Who knew that communities of collaborators would come out that way? It's a good example, in miniature, of what we all do when we try to impose order on a complicated part of the world.

    ReplyDelete