research in the “eHumanities” needs social and computer scientists

Last week I attended the “Get going: The Nijmegen Spring School in eHumanities”. The school focused on three programs and or skills sets- Python, R and Gephi. I thoroughly enjoyed getting my hands dirty with Python, and can’t believe I have lived my scientific life, up ’til now, without it! I would recommend learning this language to all social scientists and humanities scholars working with Big data in particular, as it is a fairly straight forward language, with an increasing amount of online tutorials. Stop doing everything manually!, or within the boundaries of Excel and learn a tool that will speed up processes such as variable recoding.

Beyond learning these skills I also learned a lot from participants about the Humanities, as most attendees came from the Humanities, with a few exceptions of a few social scientists and computer scientsits. I myself am a trained social scientists and had little knowledge about the Humanities and the giant push for so-called eHumanities/digital Humanities and the like. This emergence of increased funding and interest within the academic community seemed to be coming from two sources: the increasing digitialization of sources used in the Humanities, combined with the lack of training (of course there are exceptions) about how to conceptualize, operationalize, and analyze such data. Let me make this clear I am NOT saying that pre-digital work or non-digital work in the Humanities is/was fruitless but rather that the field is challenged to formulate a new methodology and thus skill set for researchers. This emerged within this small group of mainly Humanities scholars during the workshop- few had experience in statistics, or operationalizing data into network terms, or thinking in such terms/schemas. I am not criticizing attendees, this was the goal of the workshop after all, but I was struck, as someone versed in these techniques/knowledge, how helpful these techniques could be and thus the truly great need to fill this knowledge gap.

Often a way to bridge this is to bring in Computer Scientists- experts in automating everything, organizing data, analyzing large data sets, modelling problems; on paper certainly the most logical step to aid Humanities scholars. But after this three day workshop I see a missing piece that I think may be essential to bridge this gap even further, and that is integrating social scientists as well. Of course, you will probably say this is self promotion, as certainly it has sparked my interested, but hear me out. Social Science as a field has a long tradition of discerning valid and reliable methodologies for analyzing all sorts of data types, origins, sample sizes and the like. There is a strong tradition among quantitative social scientists to be trained in various statistical methods that allow the questioning of causal mechanisms (relationships between sets of variables). Social Science as a field is also faced with the increasing availability of Big data and thus also are teaming up with computer scientists to expand applications. It seems quite obvious to me that there is a need for the three disciplines to address this e-fying of the Humanities to redefine the boundaries from looking at different (multidisciplinary) research questions, as well frameworks for integrating methodologies for using such data.

These discussion also challenged me to think about how I think about my data and thus research questions. Although I actively attempt to expand the reach of my disciplinary blinders through work with computer scientists in particular, I certainly now see the advantages of considering a combined Humanities approach; particularly brainstorming and thus exploring the increasing amount of data produced with the Humanities in mind.

Any suggestions about where to start?


Studying Large Social Networks

In my research I investigate dynamics in large social networks, networks of >1000 nodes. The most commonly used models have a number of limitations for studying the combined effects of both network and social parameters on network evolution; thus a year ago I started working with Rena Bakhshi, a very talented researcher within the Network Institute. Rena had a model- the mean field model, which had used to investigate dynamics of large communication networks and wanted to experiment with social network data. This resulted in our first application of the mean-field model for large social networks, entitled – Scalable Analysis for Large Social Networks: The Data-Aware Mean-Field Approach, recently published in Social Informatics. See the abstract here.


Studies on social networks have proved that endogenous and exogenous factors influence dynamics. Two streams of modeling exist on explaining the dynamics of social networks: 1) models predicting links through network properties, and 2) models considering the effects of social attributes. In this interdisciplinary study we work to overcome a number of computational limitations within these current models. We employ a mean-field model which allows for the construction of a population-specific model informed from empirical research for predicting links from both network and social properties in large social networks.. The model is tested on a population of conference coauthorship behavior, considering a number of parameters from available Web data. We address how large social networks can be modeled preserving both network and social parameters. We prove that the mean-field model, using a data-aware approach, allows us to overcome computational burdens and thus scalability issues in modeling large social networks in terms of both network and social parameters. Additionally, we confirm that large social networks evolve through both network and social-selection decisions; asserting that the dynamics of networks cannot singly be studied from a single perspective but must consider effects of social parameters.

Scientists use of the Web

Scientists are increasingly using the Web to exchange, share, and accumulate/identify knowledge. The use of the Web by scientists is a field of growing interest. Thus it made us question, who is using these Web platforms? All scientists, specific groups/ages/disciplines of science. With a group of computer scientists within the Network Institute we developed a method and tool to identify a set of known scientists to able to reflect on the representativeness of Web studies of scientists online. This work was recently presented at the Sixth Chinese Semantic Web Symposium (CSWS2012) and the First Chinese Web Science Conference (CWSC2012) in Shenzhen, China. And will be published shortly in the conference proceedings, for now you can find the publication here.


Yesterday Times Higher Education published an interesting article by Matthew Gamble, a computer scientist working on web science questions. Gamble’s article addresses the need for Web 2.0 scholarship – the use of online metrics for evaluating science; piggy backing on other discussions in the field such as alt-metrics (which Gamble also mentions).

This discussion opens doors to a number of questions about knowledge production processes as well as what is valued in science and what should/could be measured as impact. These discussions were also the topic of the recent altmetrics workshop at the Web Science Conference in Koblenz, Germany in June 2011 (which I attended). The Altmetrics workshop itself was the first steps towards building a recognized community in science who were researching alternative metrics to science. The workshop brought together researchers from multiple disciplines and facilitated great discussions on a wide number of topics that look at understanding not only Web behaviors of scientists, but collection and disambiguation problems of Web data and how to understand the implications of science and knowledge production on the Web. Overall one of the best workshops I have attended, yet, that perfectly fit my area of growing expertise.

I presented some exploratory research on the validity of online metrics in science. The work was completed with my colleague Shenghui Wang, a talented computer scientist, who I developed a crawler with (she did the actual building, I did the informing) to investigate a community of scientists online. The title was – “Who are we talking about?: the validity of online metrics for commenting on science”. You can find the complete abstract here: Paper is in the works.

Preliminary/exploratory results indicated that, in the sample of Dutch computer scientists and their co-authors from 2007 – March 2011, the higher your h-index (a measure of performance) the more likely you are to be found on LinkedIn, Slideshare and have a blog. Additionally the higher the citation score (a measure of tenure and performance) the more likely you are be on LinkedIn and have a blog. This suggests that among this community the measuring of web behaviors of a scientists own enterprise are representative of dynamics of scientist who have both a higher tenure and higher performance, thus when talking about implications of altmetrics and or analyzing behavior on these social media sites we need to be explicit about who we can generalize about and how these reflect to greater dynamics in science; as for this sample we can only reflect on the behaviors of high performance and tenured scientists. Further research needs to be completed to test this on other research communities and further develop recall precision techniques used in the web crawler to obtain the data on scientists’ presence on these sites; although we might suggest that if this holds true for other communities that altmetrics would provide a unique avenue for analyzing those leading the pack in their respective fields which would allow more immediate impact measures for understanding science overcoming the delay of impact measures that integrate citation.

Semantically Mapping Science Project @ VU Amsterdam

Since I started my PhD I have been part of the SMS-project – the Semantically Mapping Science Project (  This project aligned well with my own research of using Web data (when possible) to investigate collaboration in science, as the group of researchers related to the project are all working on different aspects of tracing Web behaviors. The project includes a number of computer scientists interested in implementing Semantic Web techniques and a group of social scientists who are working to analyze traceable Web behaviors to understand a combination of mechanisms within science from structures of science (publication practices), to understanding the dynamics of online behaviors in e-infrastructures. The under lying question that brings it all together is – Can we use Semantic Web techniques to meaningfully detect, retrieve and manipulate such web-traces of activities of scientists in order to improve Scientometrics studies?

A number of concrete projects have already emerged from within this idea with lots of others in the works, including:

work on science blogging: Paul Groth, Thomas Gurney (2010) Studying Scientific Discourse on the Web using Bibliometrics: A Chemistry Blogging Case Study, In Press. In WebSci10: Extending the Frontiers of Society On-Line.

a number of projects on altmetrics including: Julie M. Birkholz and Shenghui Wang (2011) Who are we talking about?: the validity of online metrics for commenting on science, & Daphne Duin and Peter van den Besselaar The search for alternative metrics for taxonomy  both presentations at the Altmetrics workshop at the WebSci11 conference, Koblenz, Germany.

I am looking forward to the upcoming academic semester for further expanding on these projects and getting some other ones out the door!



open data everywhere

Reading through my tweets this afternoon during a break from re-reading Karin D. Knorr-Cetina‘s “The Manufacture of Knowledge”, to aid in conceptualizing some processes in science for the first draft of my first chapter of my dissertation (a nice read about the sociology of science), AND saw this tweet from “Submitted to Slovenian Ministry of Higher Education, Science and Technology” eIFL and SPARC, two organizations working to advocate change in different research processes, released a statement that supports the Slovenian Ministry of Higher Educations, Science and Technology proposal to create “a national open data and open publication infrastructure and mandatory deposition of publicly funded data and publications”. What great news!

Open data is popping up everywhere:). What is so interesting about open data? Check out this video from the Web Wide Web Foundation about open data. Check out these (this is not an exhaustive list, just some samples collected from tweet links from my colleagues at the VU Amsterdam – Frank van Harmelen:!/FrankVanHarmele, Paul Groth:!/pgroth; to give you an idea about what is happening in regards to open data):

Open Knowledge Foundation’s European-level data registry-  and some technical info about it.

UKs open data on government spending –

Europeana making meta data available –

The NYTimes also has linked open data –

India’s also going “open”.

Might also want to check out – What’s data linking? – checkout a short intro here.