One of the interesting aspects of being involved in a necessarily interdisciplinary endeavor such as NULab is understanding the degree to which we are already working with one another in the field. As a whole, does the NULab community engage with one another personally and professionally beyond our own assumptions based on the behavior we see from our own view of the world? Every individual in the NULab could certainly bring valid evidence forwards for current interdisciplinary activity as part of their work, particularly in the context of other NULab members. Indeed, it would also be possible to collect statistics of when people are working with other NULab members in some fashion in a very intentional way. These approaches would require some form of field work – NULab fellows could go survey the faculty, for example, and record the interactions that they reported. NULab members could also start recording every interaction they have with other NULab members in some sort of journal. This data could then be synthesized into some formal set of relationships, and then we may envision this as a web of relationships, or in short, a network.
One of the most interesting questions that surrounds much of the work that NULab members engage in is questioning the provenance of data. In the above version of events, our casual research question (“To what extent do NULab members already engage with one another’s work and with one another in their creation of work”) can allow us to imagine different potential methodological approaches (such as participant interviews or encouraging our participants to log their interactions directly). These types of data would be largely considered “self-report” data. This “self-report” data is distinct from “trace” or “behavioral” data. In a digitally-mediated environment, many of our interactions may be recorded on some form of social media, some record of attendance, and so forth. While we could rely solely on self-report data, the advantages that behavioral data provide allows us to step aside from the myriad problems that arise from self-report data. To be sure, behavioral data is flawed as well. Perhaps the deepest flaw one could levy is that it tends to measure only one form of interaction between people, or one dimension of interactions, at a time. This is in contrast to self-report information, where a participant could potentially list out the multiplex number of interactions that they have had with someone both digitally and physically. Still, let’s explore the behavioral approach (particularly since it allows us to design some colorful graphs!).
In our initial discussions about mapping out the relationship between members of the NULab, several of us involved in the conversation landed fairly quickly on two different data sources: Google Scholar and Twitter. Each of them carries their own provenance, strength, weakness, and interpretation as a dataset.
Google Scholar, for the last few people unaware of the service, is a search engine tailored to academia. Along with several other systems, Google Scholar allows for a peek into the various forms of the citation network formally linking much of the worlds research together. In the case of Google Scholar, academics are given the opportunity to create their own “page” not unlike a social network. This in turn lets them curate, specify, and add all of the research articles they have written. In turn, these articles are parsed by Google to identify which other articles were cited in that work (and then, articles that are novel to the discovery system are searched for and brought into the network in turn). In our case, we asked all NULab members to setup, curate, and verify their Google Scholar Profiles — after that step, I went to each of their profiles programmatically, collected all of the works they had produced (along with their coauthors), and for every time they worked with a co-author, I added a link between the NULab member and the co-author (which could also be a NULab member), as well as all the co-author pairs for that work. By stitching together NULab members and all their co-authors, we were able to get a web of who works with whom in the NULab community and its periphery (according to Google Scholar’s data, of course).
The data has a few strengths and a few weaknesses. For brevity, and as a comparator, I want to focus in on tie strength. For many fields of research, if you have written a paper with someone, chances are very high that you know this person quite well. You may even be personal friends beyond professional colleagues that spend significant time on overlapping research questions. In this case, when we see an edge tying two people, it’s very likely that those relationships are strong relationships, with both sides of the relationship capable of recalling the other.
The biggest weakness, however, lies in a few different biases in the data. First and for most, when someone is cited as “J. Smith”, does it refer to Justin, Jonathan, Jerry, or Julia? Nothing inherently provides further clarity. In short, citation networks have ambiguous naming patterns for individuals, and so, identifying every individuals contribution to the network is excessively difficult. Additionally, people may even change their names — consider the most obvious case of marriage, for example. Finally, some research traditions do in fact vary with how often one writes a research article, and indeed whether one typically publishes alone (which would negate them in this graph, even though they likely work with others in other forms), in small groups, or indeed in very large groups as is seen in some of the sciences.
Nevertheless, the data shows some interesting relationships, and the ties between individuals, while potentially fraught with different forms of errors, provides an interesting and enjoyable first peek into the question of how well tied NULab may be. A visualization of that network is below.
Twitter is one of the more well known data sets for investigating social ties between people. Being defined as anything from a social network to a broadcast channel to an online community, all I would like to define it as for now is as a directed network of accounts following one another – accounts may represent individuals, institutions, bots, or other types of actors. On Twitter, following an account means that when you log in to look at your “feed” (one of the primary facets of Twitter), their content will be displayed as they create it. By following someone, in some effect then, we give them some of our “attention” – the inverse of this implies that highly followed accounts garner attention, and so are popular in some sense. Following an account also necessarily presupposes that you are aware of the accounts existence. While institutions, bots, and other types of actors are certainly present, the vast majority of accounts are generally individuals, and is certainly the case when we consider the NULab members, whom we likely know to be neither institution nor bot.
The data from Twitter also has a few strengths and weaknesses. In comparison to Google Scholar’s information, a tie on Twitter is cheap. While it’s possible, the ability to intimately know all of the accounts you follow, or the accounts that follow you, is much less likely largely as a function of the ease at which it is possible to follow them. As a result, we would expect to see biases in the opposite direction of Google Scholar – there are friendships stated in the Twitter graph that don’t necessarily exist in the real world, and may not even be meaningful on Twitter itself. Additionally, we have other biases around use – did someone follow another person today or four years ago? Temporal data is lost on Twitter, which is not the case with Google, which may help in understanding the richness or depth of the relationship.
Twitter doesn’t suffer from some of the issues around Google Scholar however – people may use Twitter in their own distinct ways, but that diversity is likely much less broad in terms of how it reveals itself in Network Structure as it is the case in the Google Scholar network. Additionally, the biases around publication schedules according to discipline also melt away – one person’s tweet is as equally valid as another person’s tweet. Additionally, Google Scholar is a professional or academic tie – Twitter may be professional, personal or purely social.
In short, Twitter provides us with a much less anemic (perhaps to a fault) graph with which to explore the interactions between NULab members and the people they regularly interact with. Below is a version of that graph for you to explore.
The project was, of course, a bit of a toy project to get myself acquainted to the NULab community. By no means does any of the casual analysis here imply anything serious about the NULab, and indeed any claims have been largely avoided. The more interesting question at this stage is the data itself – what data we collect, what biases it contains, what it can or can’t tell us, and so forth are all interesting questions facing all NULab members, whether they find themselves in computational social sciences or digital humanities.
The data here also tells another interesting story – the two networks, in several different ways, complement each others biases. In writing this, I found myself thinking, “in what ways can the issues that each of these datasets have actually be mitigated by one another”? I would like to put that question to you as well, and I think that the general dynamics of this line of questioning would also apply broadly to members of the NULab community and the challenges they may face.