interdisciplinary research

From November 1 – 3, 2012, I attended the PLOS Article Level Metrics Workshop in San Francisco .

PLOS is a major open-access online publisher and the publisher of the leading megajournal PLOS One. A mega-journal is one that accepts any scientifically sound manuscript. This means there is no decision on novelty just a decision on whether the paper was done in a scientifically sound way. The consequence is that this leads to much more science getting published and the corresponding need for even better filters and search systems for science.
As an online publisher, PLOS tracks many what are termed article level metrics – these metrics go beyond of traditional scientific citations and include things like page views, pdf downloads, mentions on twitter, etc. Article level metrics are to my mind altmetrics aggregated at the article level.
PLOS provides a comprehensive api to obtain these metrics and wants to encourage the broader adoption and usage of these metrics. Thus, they organized this workshop. There were a variety of people attending (¬†from publishers (including open access ones and the traditional big ones), funders, librarians to technologists. I was a bit disappointed not to see more social scientists there but I think the push here has been primarily from the representative communities. The goal was to outline key challenges for altmetrics and then corresponding concrete actions that could place in the next 6 months that could help address these challenges. It was an unconference so no presentations and lots of discussion. I found it to be quite intense as we often broke up into small groups where one had to be fully engaged. The organizers are putting together a report that digests the work that was done. I’m excited to see the results.

Me actively contributing ūüôā Thanks Ian Mulvany!


  • Launch of the PLOS Altmetrics Collection.¬†This was really exciting for me as I was one of the organizers of getting this collection produced. Our editorial is here:¬†This collection provides a nice home for future articles on altmetrics
  • I was impressed about the availability of APIs. There are now several aggregators and good sources of altmetrics in just a bit of time. ImpactStory,, plos alm apis, mendeley,, microsoft academic search
  • rOpenSci (¬†is a cool project that provides R apis to many of these alt metric and other sources for analyzing data
  • There’s quite a bit of interest in services to do these metrics. For example, Plum Analytics (¬†has a test being done at the University of Pittsburgh. I also talked to other people who were getting interest in using these alternative impact measures and also heard a number of companies are now providing this sort of analytics service.
  • I talked a lot to Mark Hahnel from about the Data2Semantics LinkItUp service. He is super excited about it and loved the demo. I’m really excited about this collaboration.
  • Microsoft Academic Search is getting better, they are really turning it into a production product with better and more comprehensive data. I’m expecting a really solid service in the next couple of months.
  • I learned from Ian Mulvany¬†of eLife that Graph theory is mathematically “the same as” statistical mechanics in physics.
  • Context, Context, Context – there was a ton of discussion about the importance of context to the numbers one gets from altmetrics. For example, being able to quickly compare to some baseline or by knowing the population which the number is applied.

    White board thoughts on context! thanks Ian Mulvany

  • Related to context was the need for simple semantics – there was a notion that for example we need to know if a retweet in twitter was positive or negative and what kind of person retweeted the paper (i.e. a scientists, a member of the public, a journalist, etc). This was because that unlike citations the population that altmetrics uses is not as clearly defined as it exists in a communication medium that doesn’t just contain scholarly communication.
  • I had a nice discussion with¬†Elizabeth Iorns the founder of¬†¬†. There doing cool stuff around building markets for performing and replicating experiments.
  • Independent of the conference, I met up with some people I know from the natural language processing community and one of the things that they were excited about is computational semantics but using statistical approaches. It seems like this is very hot in that community and something we in the knowledge representation & reasoning community should pay attention to.


Associated with the workshop was a hackathon held at the PLOS offices. I worked in a group that built a quick demo called . This was a bookmarklet that would highlight papers in pubmed search results based on their online impact according to impact story. So you would get different color coded results based on alt metric scores. This only took a day’s worth of work and really showed to me how far these apis have come in allowing applications to be built. It was a fun environment and was really impressed with the other work that came out.

Random thought on San Francisco

  • Four Barrel coffee serves really really nice coffee – but get there early before the influx of ultra cool locals
  • The guys at Goody Cafe are really nice and also serve good coffee
  • If you’re in the touristy Fisherman’s Wharf area walk to the Fort Mason for fantastic views of the golden gate bridge. The hostel there also looks cool.

Here’s an interesting TED talk by cognitive psychologist¬†Paul Bloom¬†about the origins of¬†¬†pleasure. What’s cool to me is he uses the same¬†anecdotes (Hans van Meergeren, Joshua Bell)¬†that I’ve used previously to illustrate the need for provenance. ¬† I often make a technical case for provenance for automated systems. He makes a¬†compelling¬†case that provenance is¬†fundamental¬†for people. Check out the video below… and let me know what you think.

Thanks to Shiyong Lu for the pointer.

It’s been about two weeks since we had the almetrics11 Workshop at Web Science 2011¬†but I was swamped with the ISWC conference deadline so I just got around till posting about this now.

The aim of the workshp was to gather together the group of people working on next generation measures of science based on the Web. Importantly, as organizers, Jason, Dario and I wanted to encourage the growth of the scientific side of altmetrics.

The workshop turned out to be way better than I expected. We had roughly 36 attendees, which was way beyond our expectations. You can see some of the attendees here:

There was nice representation from my institution (VU University Amsterdam) including talks by my collaborators Peter van den Besselaar and Julie Birkholtz. But we had attendees from Israel, the UK, the US and all over Europe. People were generally excited about the event and the discussions went well (although the room was really warm). I think we all had a good time the restaurant, the Alt-Coblenz – highly recommended by the way-and an appropriate name. Thanks to the WebSci organizing team for putting this together.

We had a nice mix of social scientists and computer scientists (~16 & 20 respectively). Importantly, we had representation from the bibliometrics community, social studies of science, and computer science.

Importantly, for an emerging community, there was a real honesty about the research. Good results were shown but importantly almost every author discussed where the gaps were in their own research.

Two discussions come to the fore for me. One was on how we evaluate altmetrics. ¬†Mike Thelwall¬†who gave the keynote (great job by the way) suggests¬†using correlations to the journal impact factor to help¬†demonstrate¬†that there is something scientifically valid that your measuring. What you want is not perfect correlation but correlation with a gap and that gap is what your new alternative metric is then measuring. There was also the notion from Peter van den Besselaar is that we should look more closely our how our metrics match what scientists do in practice (i.e.¬†qualitative¬†studies). For example, do our metrics correlate with promotions or hiring. The second discussion was around where to go next with altmetrics. In particular, there was a discussion on how to position altmetrics in the research field and really it seemed to position itself within and across the fields of science studies (i.e¬†scientometrics,¬†webometrics,virtual ethnograpy¬†). Importantly, it was felt that we needed a good common corpus of information in order to comparative studies of metrics. Altmetrics has the problem of data acquisition. While some people are interested in that others want to focus on metric generation and evaluation. A corpus of traces of science¬†online was felt to be a good way to interconnect both data¬†acquisition¬†and metric generation and allow for such comparative studies. But how to build the corpus….Suggestions welcome.

The attendees wanted to have an altmetrics12 so I’m pretty sure we will do that. Additionally, we will have some exciting news soon about a journal special issue on altmetrics.

Some more links:

Abstracts of all talks

Community Notes

Also, could someone leave a link to the twitter archive in the comments? That would be great.

I’ve posted ¬†a couple of times on this blog about events organized at the VU University Amsterdam to encourage¬†interdisciplinary¬†collaboration. One of the major issues to come out of these prior events was that data sharing is a critical mechanism for enabling¬†interdisciplinary¬†research. However, often times it’s difficult for scientists to know:

  1. Who has what data? and;
  2. whether that data is interesting to them?

This second point is important. Because different disciplines use different vocabularies, it is often times hard to understand whether a data set is truly useful or interesting in the context of new domains. What is data for one domain may or may not be data in another domain.

To help bridge this gap, Iina Hellsten (Organizational Science), Leonie Houtman (Business Research) and myself (Computer Science) organized a Network Institute workshop this past this past Wednesday (March 23, 2011) titled What is Data?

The goal of the workshop was to bring people together from this different domains to discuss the data they use in their everyday practice and to describe what makes data useful to them.

Our goal wasn’t to come up with a philosophical answer to the question but instead build a map of what researchers from these disiplines consider to be useful data for them. ¬†More importantly, however, was to bring these various researchers together to talk to one another.

I was very impressed with the turnout. Around 25  people showed up from social science, business/management research and computer science. Critically, the attendees were fully engaged and together produced a fantastic result.

The attendees

The Process

To build a map of data, we used a variant of a classic knowledge acquisition technique called card sorting. The attendees were divided up into groups (shown above) making sure that the groups had a mix of researchers from each disciplines. Within each group, every researcher was asked to give examples of the data they worked with on a daily basis and explain to the others a bit about they did with that data. This was a chance for people to get to know each other and have discussions in smaller groups. After the end of this each group had a pile of index cards with examples of data sets.

Writing down example data sets

The groups were then asked to group these examples together and then give those collections labels. This was probably the most  difficult part of the process and led to lots of interesting discussions:

Discussion about grouping

Here’s an example result from one of the groups (the green post-it notes are the collection labels):

Sorted cards

The next step was that everyone in the room got to walk around and label the example data sets from all groups with attributes that they thought were important to them. For example, a social networking data set is interesting to me if I can access it programmatically. Each discipline got their own color. Pink = computer science, Orange = social science, yellow = management science.

This resulted in very colorful tables:

After labelling

Once this process was complete, we merged the various tables groupings together by data sets and category (i.e. collection label) leading to a map of data sets:

The Results

A Map of Data

Above is the map created by the group. You can find a (more or less faithful) transcription of the map here. Here’s some highlights.

There were 10 categories of data:

  1. Elicited data (e.g. surveys)
  2. Data based on measurement (e.g. logfiles)
  3. Data wit a particular formats (e.g. xml)
  4. Structured-only data (e.g. databases)
  5. Machine data (e.g. results of a simulation)
  6. Textual data (e.g. interview transcripts)
  7. Social data (e.g. email)
  8. Indexed data (e.g. Web of Science)
  9. Data useful for both quantitative and qualitative analysis (e.g. newspapers)
  10. Data about the researchers themselves (e.g. how did they do an analysis)

After transcribing the data, I would say that computer scientists are interested in having strong structure in the data, whereas social scientists and business scientists are deeply concerned with having high quality data that is representative, credible, and was collected with care. Across all disciplines temporality (or having things on a timeline) seemed to be a critical attribute of useful data.

What’s next?

At the end of the workshop, we discussed where to go from here. The plan is to have a follow-up workshop where each discipline can present their own datasets using these categorizations. To help focus the workshop we are looking for two interdisciplinary teams within the VU that are willing to try data sharing and present the results of that trial at the workshop. If you have a data set, you would like to share, please post it to the Network Institute linked in group. Once you have a team, let myself, Leoni, or Iina know.




%d bloggers like this: