Know Your Data! Know Your Methods!

Assistant Research Professor for Societal Computing, School of Computer Science, Carnegie Mellon University
In this talk I want to focus on two fundamental issues related to Computational Social Science (CSS). First, CSS data are almost always secondary data and often researchers have only limited information about how the data were collected, stored, manipulated, and filtered. In comparative or overtime analysis interesting results can be created by data artifacts rather than behavioral difference. Second, the majority of social science based methods were developed in the context of small groups. Applying the same methods to thousands or millions of actors raises questions whether algorithmic assumptions or the interpretation of results of these metrics are still valid. What do some of these metrics, that we apply every day, really do? Do all my metrics fit to all of my data? Issues related to data and methods call for higher awareness which might lead to less spectacular results.