You should go read Jason Preim‘s excellent commentary in Nature –  Scholarship: Beyond the Paper but I wanted to call out a bit that I’ve talked about with a number of people and I think is important. We should be looking at how we build the best teams of scientists and not just looking for the single best individual:

Tenure and hiring committees will adapt, too, with growing urgency. Ultimately, science evaluation will become something that is done scientifically, exchanging arbitrary, biased, personal opinions for meaningful distillations of entire communities’ assessments. We can start to imagine the academic department as a sports team, full of complementary positions (theorists, methodologists, educators, public communicators, grant writers and so on). Coming years will see evaluators playing an academic version of Moneyball (the statistical approach to US baseball): instead of trying to field teams of identical superstars, we will leverage nuanced impact data to build teams of specialists who add up to more than the sum of their parts.

Science is a big team sport especially with today’s need for interdisciplinary and large-scale experiments. We need to encourage the building of teams in sciences.


Beyond the PDF - drawn notes day 1

Wow! The last three days have been crazy, hectic, awesome and inspiring. We just finished putting on The Future of Research Communication and e-Scholarhip (FORCE11)’s Beyond the PDF 2 conference  here in Amsterdam. (I was chair of the organizing committee and in charge of local arrangements) The idea behind Beyond the PDF was to bring together a diverse set of people (scholars, technologists, policy experts, librarians, start-ups, publishers, …) all interested in making scholarly and research communication better. In that case, I think we achieved are goal. We had 210 attendees from across the spectrum. Below are two charts: one of the types organizations of the attendees and domain they are from.


The program of the conference was varied. We covered new tools, business models, the context of the approach, research evaluation, visions for the futures and how to moved forward. Here, I won’t go over the entire conference here. We’ll have a complete video online soon (thanks Elsevier). I just wanted to call out some personal highlights.


We had two great keynotes from Kathleen Fitzpatrick of the Modern Language Association  and the other from Carol Tenopir (Chancellor’s Professor at the School of Information Sciences at the University of Tennessee, Knoxville). Kathleen discussed how it is essential for humanities to embrace new forms of scholarly communication as it allows for faster dissemination of their work.  Carol discussed the practice of reading for academics. She’s done in-depth tracking of how scientists read. Some interesting tidbits: successful scientists read more and so far social media use has not decreased the amount of reading that scientists do. The keynotes were really a sign of how much more humanities were present at this conference than Beyond the PDF 1.

2013-03-19 09.23.52

Kathleen Fitzpatrick (@kfitz). Director of Scholarly Communication , Modern Language Association

The tools are there

Jason Priem compares online journals to horses

Just two years ago at the first Beyond the PDF, there were mainly initial ideas and drafts for next generation research communication tools. At this year’s conference, there were really a huge number of tools that are ready to be used. Figshare, PDFX, Authorea, Mendeley, IsaTools, StemBook, Commons in a Box, IPython, ImpactStory and on…

Furthermore, there are different ways of publishing from PeerJ to and even just posting to blog. Probably the interesting idea of the conference was the use of github to essential publish.

For me this made me think it’s time to think about my own scientific workflow and figure out how to update it to better use these tools in practice.

People made connections

At the end of the conference, I asked if people had made a new connection. Almost every hand went up. It was great to see publishers, technologists, librarians also talking together. The twitter back channel at the conference was great. We saw a lot of conversations that kept going on #btpdf2 and also people commenting while watching the live stream. Check out a great Storify of the social media stream of the conference done by Graham Steel.

Creative Commons-Licentie
Beyond the PDF 2 photographs van Maurice Vanderfeesten is in licentie gegeven volgens een Creative Commons Naamsvermelding-GelijkDelen 3.0 Unported licentie.
Gebaseerd op een werk op

Making it happen

We gave a challenge to the community, “what would you do with 1k today that would change scholarly communication for the better? ” The challenge was well received and we had a bunch of different ideas from sponsoring viewing parties to encouraging the adoption of DOIs in the developing world and by small publishers.

The Challenge of Evaluation

We had a great discussion around the role of evaluation.  I think the format that was used by Carole Goble for the evaluation session where we had role playing representing key players in the evaluation of research and researchers really highlighted the fact that we have a first mover problem. None of the roles feel that “they should go first”. It was unclear how to push past that challenge.

Various Roles in Science Evaluation


Personally, I had a great time. FORCE 11 is a unique community and I think brings together people that need to talk to change the way we communicate scholarship. This was my quick thoughts on the event. There’s a lot more to come. We will have the video of the event up soon. Also, we will have drawn notes posted provided by Jongens van de Tekeningen. Also, we will award a series of 1k grants to support ongoing work. Finally, I hope to see many more blog posts documenting the different views of attendees.


We had many great sponsors that helped make a great event. Things like live streaming, student scholarships, a professional set-up, demos & dinner ensure that an event like this works.

One of the ideas in the altmetrics manifesto was that almetrics allow a diversity of metrics. With colleagues in the VU University Amsterdam’s Network Institute, we’ve been investigating the use of online data (in this case google scholar) to help create new metrics to measure the independence of researchers. In this case, we need fresh data to establish whether an emerging scholar is becoming independent from their supervisor. We just had the results of one our approaches accepted into the Web Science 2013 conference. The abstract is below and here’s a link to the preprint.

Identifying Research Talent Using Web-Centric Databases 

Anca Dumitrache, Paul Groth, and  Peter van den Besselaar

Metrics play a key part in the assessment of scholars. These metrics are primarily computed using data collected in offline procedures. In this work, we compare the usage of a publication database based on a Web crawl and a traditional publication database for computing scholarly metrics. We focus on metrics that determine the independence of researchers from their supervisor, which are used to assess the growth of young researchers. We describe two types of graphs that can be constructed from online data: the co-author network of the young researcher, and the combined topic network of the young researcher and their supervisor, together with a series of network properties that describe these graphs. Finally, we show that, for the purpose of discovering emerging talent, dynamic online resources for publications provide better coverage than more traditional datasets.

This is fairly preliminary work, it mainly establishes that we want to use the freshest possible data for this work. We are expanding the work to do a large scale study  of independence as well as to use different sources of data. But to me, this shows how the freshness of web data allows us to begin looking at and measuring research in new ways.

From November 1 – 3, 2012, I attended the PLOS Article Level Metrics Workshop in San Francisco .

PLOS is a major open-access online publisher and the publisher of the leading megajournal PLOS One. A mega-journal is one that accepts any scientifically sound manuscript. This means there is no decision on novelty just a decision on whether the paper was done in a scientifically sound way. The consequence is that this leads to much more science getting published and the corresponding need for even better filters and search systems for science.
As an online publisher, PLOS tracks many what are termed article level metrics – these metrics go beyond of traditional scientific citations and include things like page views, pdf downloads, mentions on twitter, etc. Article level metrics are to my mind altmetrics aggregated at the article level.
PLOS provides a comprehensive api to obtain these metrics and wants to encourage the broader adoption and usage of these metrics. Thus, they organized this workshop. There were a variety of people attending ( from publishers (including open access ones and the traditional big ones), funders, librarians to technologists. I was a bit disappointed not to see more social scientists there but I think the push here has been primarily from the representative communities. The goal was to outline key challenges for altmetrics and then corresponding concrete actions that could place in the next 6 months that could help address these challenges. It was an unconference so no presentations and lots of discussion. I found it to be quite intense as we often broke up into small groups where one had to be fully engaged. The organizers are putting together a report that digests the work that was done. I’m excited to see the results.

Me actively contributing 🙂 Thanks Ian Mulvany!


  • Launch of the PLOS Altmetrics Collection. This was really exciting for me as I was one of the organizers of getting this collection produced. Our editorial is here: This collection provides a nice home for future articles on altmetrics
  • I was impressed about the availability of APIs. There are now several aggregators and good sources of altmetrics in just a bit of time. ImpactStory,, plos alm apis, mendeley,, microsoft academic search
  • rOpenSci ( is a cool project that provides R apis to many of these alt metric and other sources for analyzing data
  • There’s quite a bit of interest in services to do these metrics. For example, Plum Analytics ( has a test being done at the University of Pittsburgh. I also talked to other people who were getting interest in using these alternative impact measures and also heard a number of companies are now providing this sort of analytics service.
  • I talked a lot to Mark Hahnel from about the Data2Semantics LinkItUp service. He is super excited about it and loved the demo. I’m really excited about this collaboration.
  • Microsoft Academic Search is getting better, they are really turning it into a production product with better and more comprehensive data. I’m expecting a really solid service in the next couple of months.
  • I learned from Ian Mulvany of eLife that Graph theory is mathematically “the same as” statistical mechanics in physics.
  • Context, Context, Context – there was a ton of discussion about the importance of context to the numbers one gets from altmetrics. For example, being able to quickly compare to some baseline or by knowing the population which the number is applied.

    White board thoughts on context! thanks Ian Mulvany

  • Related to context was the need for simple semantics – there was a notion that for example we need to know if a retweet in twitter was positive or negative and what kind of person retweeted the paper (i.e. a scientists, a member of the public, a journalist, etc). This was because that unlike citations the population that altmetrics uses is not as clearly defined as it exists in a communication medium that doesn’t just contain scholarly communication.
  • I had a nice discussion with Elizabeth Iorns the founder of . There doing cool stuff around building markets for performing and replicating experiments.
  • Independent of the conference, I met up with some people I know from the natural language processing community and one of the things that they were excited about is computational semantics but using statistical approaches. It seems like this is very hot in that community and something we in the knowledge representation & reasoning community should pay attention to.


Associated with the workshop was a hackathon held at the PLOS offices. I worked in a group that built a quick demo called . This was a bookmarklet that would highlight papers in pubmed search results based on their online impact according to impact story. So you would get different color coded results based on alt metric scores. This only took a day’s worth of work and really showed to me how far these apis have come in allowing applications to be built. It was a fun environment and was really impressed with the other work that came out.

Random thought on San Francisco

  • Four Barrel coffee serves really really nice coffee – but get there early before the influx of ultra cool locals
  • The guys at Goody Cafe are really nice and also serve good coffee
  • If you’re in the touristy Fisherman’s Wharf area walk to the Fort Mason for fantastic views of the golden gate bridge. The hostel there also looks cool.

Update: A version of this post appeared in SURF magazine (on the back page) in their trendwatching column

Technology at its best lets us do what we want to do without being held back by time consuming or complex processes. We see this in great consumer technology: your phone giving you directions to the nearest cafe, your calendar reminding you of a friend’s birthday, or a website telling you what films are on. Good technology removes friction.

While attending the SURF Research day, I was reminded that this idea of removing friction through technology shouldn’t be limited to consumer or business environments but should also be applied in academic research settings. The day showcased a variety of developments in information technology to help researchers do better research. Because SURF is a Dutch organization there was a particular focus on developments here in the Netherlands.

The day began with a fantastic keynote from Cameron Neylon outlining how networks qualitatively change how research can be communicated. A key point was that to create the best networks we need to make research communication as frictionless as possible.  You can find his longer argument here. After Cameron’s talk, Jos Engelen the chairman of the NWO (the Dutch NSF) gave some remarks. For me, the key take-away was that in every one of the Dutch Government’s 9 Priority Sectors, technology has a central role in smoothing both the research process and its transition to practice.

After the opening session, there were four parallel sessions on text analysis, dealing with data, profiling research, and technology for research education. I managed to attend parts of three of the sessions. In the profiling session, the recently released SURF Report on tracking the impact of scholarly publications in the 21st century, sparked my interest.  Finding new faster and broader ways of measuring impact (i.e. altmetrics)  is a way of reducing friction in science communication. The ESCAPE project showed how enriched publications can make it easy to collate and browse related content around traditional articles. The project won SURF’s enriched publication of the year award. Again, the key, simplifying the research process.  Beyond these presentations, there were talks ranging from making it easier to do novel chemistry to helping religious scholars understand groups through online forms. In each case, the technology was successful because it eliminated friction in the research process.

The SURF research day presented not just technology but how, when it’s done right, technology can make research just a bit smoother.

This past Tuesday, I had the opportunity to give a webinar for Elsevier Labs giving an overview of altmetrics. It was a fun opportunity to talk to people who have a great chance to influence the next generation of academic measurement. The slides are embedded below.

At the VU, we are also working with Elsevier Labs on the Data2Semantics project where we are trying to enrich data with additional machine understandable metadata. How does this relate to metrics? I believe that metrics (access, usage, etc) can be e a key piece of additional semantics for datasets. I’m keen to see how metrics can make our data more useful, findable and understandable.


%d bloggers like this: