interdisciplinary research

Last week, I attended ACM CHI 2013 and Web Science 2013 in Paris. I had a great time and wanted to give a recap of both conferences, which were collocated.


2013-04-29 18.45.58

This was my first time at CHI – the main computer-human interaction conference. It’s not my main field of study but I was there to Data DJ. I had an interactivity submission accepted with Ayman from Yahoo! Reseach on using turntables to manipulate data. Here’s the abstract:

Spinning Data: Remixing live data like a music DJ

This demonstration investigates data visualization as a performance through the use of disc jockey (DJs) mixing boards. We assert that the tools DJs use in-situ can deeply inform the creation of data mixing interfaces and performances. We present a prototype system, DMix, which allows one to filter and summarize information from social streams using a audio mixing deck. It enables the Data DJ to distill multiple feeds of information in order to give an overview of a live event.

Paul Groth and David A. Shamma. 2013. Spinning data: remixing live data like a music dj. In CHI ’13 Extended Abstracts on Human Factors in Computing Systems (CHI EA ’13). ACM, New York, NY, USA, 3063-3066. DOI=10.1145/2468356.2479611 (PDF)

It was a fun experience… although it was a lot of demo giving (reception + all coffee breaks). The reactions were really positive. Essentially, once a person touched the deck they really got the interaction. Plus, a couple of notable people stopped by that seemed to like the interaction: Jacob Nielsen and @kristw from twitter data science. The kind of response I got made me really want to pursue the project more. I also learned about how we can make the interaction better.

The whole prototype system is available on github. I wrote the whole using node.js and javascript in a web browser.  Warning: this is very ugly code.

In addition to my demo, I was impressed with the cool stuff on display (e.g. traceable skateboards) as well as the number of companies there looking for talent. The conference itself was huge with 3500 people and it was the first conference I attended where they had multiple sponsored parties.


Web Science was after CHI and is more in my area of research.

What we presented

2013-05-03 15.16.18

I was pleased that the VU had 8 publications at the conference, which is a really strong showing. Also two of our papers were nominated for the best paper award.

The two papers I had in the conference were very interdisciplinary.

These papers were chiefly done by the first authors both students at the VU. Anca attended Web Science and did a great job presenting our poster on using Google Scholar to measure academic independence. There was a lot of interest and we got quite a few ideas on how to improve the paper (bigger sample!).

The other paper by Fabian Eikelboom was very well received. It compared online and offline pray cards and tried to see how the web modified this form of communication. Here’s a couple of tweets:

Conference thoughts

I found quite a few things that I really liked at this year’s web science. A couple of pointers:

  • Henry S Thompson, Jonathan A Rees and Jeni Tennison: URIs in data: for entities, or for descriptions of entities: A critical analysis – Talked about the http range 14 and the problem of unintended extensibility points within standards. I think a critical area of Web Science is how the social construction of technical standards impacts the Web and its development. This is an example of this kind of research.
  • Catherine C. Marshall and Frank M. Shipman: Experiences Surveying the Crowd: Reflections on methods, participation, and reliability – really got me thinking about the notion of hypotheticals in law and how this relates to provenance on the web.
  • Panagiotis Metaxas and Eni Mustafaraj: The Rise and the Fall of a Citizen Reporter – a compelling example of how twitter influences the mexican drug war and how trust is difficult to determine online. The subsequent Trust Trails project looks interesting.
  • The folks over at the UvA at are doing a lot of fun work with respect to studying the web as a social object. It’s worth looking at their work.
  • Jérôme Kunegis, Marcel Blattner and Christine Moser. Preferential Attachment in Online Networks: Measurement and Explanations – interesting discussion of how good our standard network models are.  Check out there collection of networks to download and analyze!
  • Sebastien Heymann and Benedicte Le Grand. Towards A Redefinition of Time in Information Networks?

Unfortunately, there were some things that I hope will improve for next year. First, as you can tell above the papers were not available online during the conference. This is really a bummer when your trying to tweet about things you see and follow-up later. Secondly, I thought there were a few too many philosophy papers. In particular, it worries me when a computer scientist is presenting a philosophy paper at a science conference. I think the program committee needs to watch out for spreading too thinly in the name of interdisciplinarity. Finally, the pecha kucha session was a real  success – short, succinct presentations that really raised interest in the work. This, however, didn’t carry over into the main sessions which often ran too long.

Overall, both CHI and Web Science were well worth the time – I made a bunch of connections and saw some good research that will influence some of my work. Oh and it turns out Paris has some amazing coffee:

2013-05-03 10.37.29

Beyond the PDF - drawn notes day 1

Wow! The last three days have been crazy, hectic, awesome and inspiring. We just finished putting on The Future of Research Communication and e-Scholarhip (FORCE11)’s Beyond the PDF 2 conference  here in Amsterdam. (I was chair of the organizing committee and in charge of local arrangements) The idea behind Beyond the PDF was to bring together a diverse set of people (scholars, technologists, policy experts, librarians, start-ups, publishers, …) all interested in making scholarly and research communication better. In that case, I think we achieved are goal. We had 210 attendees from across the spectrum. Below are two charts: one of the types organizations of the attendees and domain they are from.


The program of the conference was varied. We covered new tools, business models, the context of the approach, research evaluation, visions for the futures and how to moved forward. Here, I won’t go over the entire conference here. We’ll have a complete video online soon (thanks Elsevier). I just wanted to call out some personal highlights.


We had two great keynotes from Kathleen Fitzpatrick of the Modern Language Association  and the other from Carol Tenopir (Chancellor’s Professor at the School of Information Sciences at the University of Tennessee, Knoxville). Kathleen discussed how it is essential for humanities to embrace new forms of scholarly communication as it allows for faster dissemination of their work.  Carol discussed the practice of reading for academics. She’s done in-depth tracking of how scientists read. Some interesting tidbits: successful scientists read more and so far social media use has not decreased the amount of reading that scientists do. The keynotes were really a sign of how much more humanities were present at this conference than Beyond the PDF 1.

2013-03-19 09.23.52

Kathleen Fitzpatrick (@kfitz). Director of Scholarly Communication , Modern Language Association

The tools are there

Jason Priem compares online journals to horses

Just two years ago at the first Beyond the PDF, there were mainly initial ideas and drafts for next generation research communication tools. At this year’s conference, there were really a huge number of tools that are ready to be used. Figshare, PDFX, Authorea, Mendeley, IsaTools, StemBook, Commons in a Box, IPython, ImpactStory and on…

Furthermore, there are different ways of publishing from PeerJ to and even just posting to blog. Probably the interesting idea of the conference was the use of github to essential publish.

For me this made me think it’s time to think about my own scientific workflow and figure out how to update it to better use these tools in practice.

People made connections

At the end of the conference, I asked if people had made a new connection. Almost every hand went up. It was great to see publishers, technologists, librarians also talking together. The twitter back channel at the conference was great. We saw a lot of conversations that kept going on #btpdf2 and also people commenting while watching the live stream. Check out a great Storify of the social media stream of the conference done by Graham Steel.

Creative Commons-Licentie
Beyond the PDF 2 photographs van Maurice Vanderfeesten is in licentie gegeven volgens een Creative Commons Naamsvermelding-GelijkDelen 3.0 Unported licentie.
Gebaseerd op een werk op

Making it happen

We gave a challenge to the community, “what would you do with 1k today that would change scholarly communication for the better? ” The challenge was well received and we had a bunch of different ideas from sponsoring viewing parties to encouraging the adoption of DOIs in the developing world and by small publishers.

The Challenge of Evaluation

We had a great discussion around the role of evaluation.  I think the format that was used by Carole Goble for the evaluation session where we had role playing representing key players in the evaluation of research and researchers really highlighted the fact that we have a first mover problem. None of the roles feel that “they should go first”. It was unclear how to push past that challenge.

Various Roles in Science Evaluation


Personally, I had a great time. FORCE 11 is a unique community and I think brings together people that need to talk to change the way we communicate scholarship. This was my quick thoughts on the event. There’s a lot more to come. We will have the video of the event up soon. Also, we will have drawn notes posted provided by Jongens van de Tekeningen. Also, we will award a series of 1k grants to support ongoing work. Finally, I hope to see many more blog posts documenting the different views of attendees.


We had many great sponsors that helped make a great event. Things like live streaming, student scholarships, a professional set-up, demos & dinner ensure that an event like this works.

From November 1 – 3, 2012, I attended the PLOS Article Level Metrics Workshop in San Francisco .

PLOS is a major open-access online publisher and the publisher of the leading megajournal PLOS One. A mega-journal is one that accepts any scientifically sound manuscript. This means there is no decision on novelty just a decision on whether the paper was done in a scientifically sound way. The consequence is that this leads to much more science getting published and the corresponding need for even better filters and search systems for science.
As an online publisher, PLOS tracks many what are termed article level metrics – these metrics go beyond of traditional scientific citations and include things like page views, pdf downloads, mentions on twitter, etc. Article level metrics are to my mind altmetrics aggregated at the article level.
PLOS provides a comprehensive api to obtain these metrics and wants to encourage the broader adoption and usage of these metrics. Thus, they organized this workshop. There were a variety of people attending ( from publishers (including open access ones and the traditional big ones), funders, librarians to technologists. I was a bit disappointed not to see more social scientists there but I think the push here has been primarily from the representative communities. The goal was to outline key challenges for altmetrics and then corresponding concrete actions that could place in the next 6 months that could help address these challenges. It was an unconference so no presentations and lots of discussion. I found it to be quite intense as we often broke up into small groups where one had to be fully engaged. The organizers are putting together a report that digests the work that was done. I’m excited to see the results.

Me actively contributing 🙂 Thanks Ian Mulvany!


  • Launch of the PLOS Altmetrics Collection. This was really exciting for me as I was one of the organizers of getting this collection produced. Our editorial is here: This collection provides a nice home for future articles on altmetrics
  • I was impressed about the availability of APIs. There are now several aggregators and good sources of altmetrics in just a bit of time. ImpactStory,, plos alm apis, mendeley,, microsoft academic search
  • rOpenSci ( is a cool project that provides R apis to many of these alt metric and other sources for analyzing data
  • There’s quite a bit of interest in services to do these metrics. For example, Plum Analytics ( has a test being done at the University of Pittsburgh. I also talked to other people who were getting interest in using these alternative impact measures and also heard a number of companies are now providing this sort of analytics service.
  • I talked a lot to Mark Hahnel from about the Data2Semantics LinkItUp service. He is super excited about it and loved the demo. I’m really excited about this collaboration.
  • Microsoft Academic Search is getting better, they are really turning it into a production product with better and more comprehensive data. I’m expecting a really solid service in the next couple of months.
  • I learned from Ian Mulvany of eLife that Graph theory is mathematically “the same as” statistical mechanics in physics.
  • Context, Context, Context – there was a ton of discussion about the importance of context to the numbers one gets from altmetrics. For example, being able to quickly compare to some baseline or by knowing the population which the number is applied.

    White board thoughts on context! thanks Ian Mulvany

  • Related to context was the need for simple semantics – there was a notion that for example we need to know if a retweet in twitter was positive or negative and what kind of person retweeted the paper (i.e. a scientists, a member of the public, a journalist, etc). This was because that unlike citations the population that altmetrics uses is not as clearly defined as it exists in a communication medium that doesn’t just contain scholarly communication.
  • I had a nice discussion with Elizabeth Iorns the founder of . There doing cool stuff around building markets for performing and replicating experiments.
  • Independent of the conference, I met up with some people I know from the natural language processing community and one of the things that they were excited about is computational semantics but using statistical approaches. It seems like this is very hot in that community and something we in the knowledge representation & reasoning community should pay attention to.


Associated with the workshop was a hackathon held at the PLOS offices. I worked in a group that built a quick demo called . This was a bookmarklet that would highlight papers in pubmed search results based on their online impact according to impact story. So you would get different color coded results based on alt metric scores. This only took a day’s worth of work and really showed to me how far these apis have come in allowing applications to be built. It was a fun environment and was really impressed with the other work that came out.

Random thought on San Francisco

  • Four Barrel coffee serves really really nice coffee – but get there early before the influx of ultra cool locals
  • The guys at Goody Cafe are really nice and also serve good coffee
  • If you’re in the touristy Fisherman’s Wharf area walk to the Fort Mason for fantastic views of the golden gate bridge. The hostel there also looks cool.

Here’s an interesting TED talk by cognitive psychologist Paul Bloom about the origins of  pleasure. What’s cool to me is he uses the same anecdotes (Hans van Meergeren, Joshua Bell) that I’ve used previously to illustrate the need for provenance.   I often make a technical case for provenance for automated systems. He makes a compelling case that provenance is fundamental for people. Check out the video below… and let me know what you think.

Thanks to Shiyong Lu for the pointer.

It’s been about two weeks since we had the almetrics11 Workshop at Web Science 2011 but I was swamped with the ISWC conference deadline so I just got around till posting about this now.

The aim of the workshp was to gather together the group of people working on next generation measures of science based on the Web. Importantly, as organizers, Jason, Dario and I wanted to encourage the growth of the scientific side of altmetrics.

The workshop turned out to be way better than I expected. We had roughly 36 attendees, which was way beyond our expectations. You can see some of the attendees here:

There was nice representation from my institution (VU University Amsterdam) including talks by my collaborators Peter van den Besselaar and Julie Birkholtz. But we had attendees from Israel, the UK, the US and all over Europe. People were generally excited about the event and the discussions went well (although the room was really warm). I think we all had a good time the restaurant, the Alt-Coblenz – highly recommended by the way-and an appropriate name. Thanks to the WebSci organizing team for putting this together.

We had a nice mix of social scientists and computer scientists (~16 & 20 respectively). Importantly, we had representation from the bibliometrics community, social studies of science, and computer science.

Importantly, for an emerging community, there was a real honesty about the research. Good results were shown but importantly almost every author discussed where the gaps were in their own research.

Two discussions come to the fore for me. One was on how we evaluate altmetrics.  Mike Thelwall who gave the keynote (great job by the way) suggests using correlations to the journal impact factor to help demonstrate that there is something scientifically valid that your measuring. What you want is not perfect correlation but correlation with a gap and that gap is what your new alternative metric is then measuring. There was also the notion from Peter van den Besselaar is that we should look more closely our how our metrics match what scientists do in practice (i.e. qualitative studies). For example, do our metrics correlate with promotions or hiring. The second discussion was around where to go next with altmetrics. In particular, there was a discussion on how to position altmetrics in the research field and really it seemed to position itself within and across the fields of science studies (i.e scientometricswebometrics,virtual ethnograpy ). Importantly, it was felt that we needed a good common corpus of information in order to comparative studies of metrics. Altmetrics has the problem of data acquisition. While some people are interested in that others want to focus on metric generation and evaluation. A corpus of traces of science online was felt to be a good way to interconnect both data acquisition and metric generation and allow for such comparative studies. But how to build the corpus….Suggestions welcome.

The attendees wanted to have an altmetrics12 so I’m pretty sure we will do that. Additionally, we will have some exciting news soon about a journal special issue on altmetrics.

Some more links:

Abstracts of all talks

Community Notes

Also, could someone leave a link to the twitter archive in the comments? That would be great.

I’ve posted  a couple of times on this blog about events organized at the VU University Amsterdam to encourage interdisciplinary collaboration. One of the major issues to come out of these prior events was that data sharing is a critical mechanism for enabling interdisciplinary research. However, often times it’s difficult for scientists to know:

  1. Who has what data? and;
  2. whether that data is interesting to them?

This second point is important. Because different disciplines use different vocabularies, it is often times hard to understand whether a data set is truly useful or interesting in the context of new domains. What is data for one domain may or may not be data in another domain.

To help bridge this gap, Iina Hellsten (Organizational Science), Leonie Houtman (Business Research) and myself (Computer Science) organized a Network Institute workshop this past this past Wednesday (March 23, 2011) titled What is Data?

The goal of the workshop was to bring people together from this different domains to discuss the data they use in their everyday practice and to describe what makes data useful to them.

Our goal wasn’t to come up with a philosophical answer to the question but instead build a map of what researchers from these disiplines consider to be useful data for them.  More importantly, however, was to bring these various researchers together to talk to one another.

I was very impressed with the turnout. Around 25  people showed up from social science, business/management research and computer science. Critically, the attendees were fully engaged and together produced a fantastic result.

The attendees

The Process

To build a map of data, we used a variant of a classic knowledge acquisition technique called card sorting. The attendees were divided up into groups (shown above) making sure that the groups had a mix of researchers from each disciplines. Within each group, every researcher was asked to give examples of the data they worked with on a daily basis and explain to the others a bit about they did with that data. This was a chance for people to get to know each other and have discussions in smaller groups. After the end of this each group had a pile of index cards with examples of data sets.

Writing down example data sets

The groups were then asked to group these examples together and then give those collections labels. This was probably the most  difficult part of the process and led to lots of interesting discussions:

Discussion about grouping

Here’s an example result from one of the groups (the green post-it notes are the collection labels):

Sorted cards

The next step was that everyone in the room got to walk around and label the example data sets from all groups with attributes that they thought were important to them. For example, a social networking data set is interesting to me if I can access it programmatically. Each discipline got their own color. Pink = computer science, Orange = social science, yellow = management science.

This resulted in very colorful tables:

After labelling

Once this process was complete, we merged the various tables groupings together by data sets and category (i.e. collection label) leading to a map of data sets:

The Results

A Map of Data

Above is the map created by the group. You can find a (more or less faithful) transcription of the map here. Here’s some highlights.

There were 10 categories of data:

  1. Elicited data (e.g. surveys)
  2. Data based on measurement (e.g. logfiles)
  3. Data wit a particular formats (e.g. xml)
  4. Structured-only data (e.g. databases)
  5. Machine data (e.g. results of a simulation)
  6. Textual data (e.g. interview transcripts)
  7. Social data (e.g. email)
  8. Indexed data (e.g. Web of Science)
  9. Data useful for both quantitative and qualitative analysis (e.g. newspapers)
  10. Data about the researchers themselves (e.g. how did they do an analysis)

After transcribing the data, I would say that computer scientists are interested in having strong structure in the data, whereas social scientists and business scientists are deeply concerned with having high quality data that is representative, credible, and was collected with care. Across all disciplines temporality (or having things on a timeline) seemed to be a critical attribute of useful data.

What’s next?

At the end of the workshop, we discussed where to go from here. The plan is to have a follow-up workshop where each discipline can present their own datasets using these categorizations. To help focus the workshop we are looking for two interdisciplinary teams within the VU that are willing to try data sharing and present the results of that trial at the workshop. If you have a data set, you would like to share, please post it to the Network Institute linked in group. Once you have a team, let myself, Leoni, or Iina know.




%d bloggers like this: