Last week (Oct 7 – 9) the altmetrics community made its way to Amsterdam for 2:AM (the second altmetrics conference) and altmetrics15 (the 4th altmetrics workshop). The conference is aimed more at practitioners while the workshop has a bit more research focus. I enjoyed the events from both a content (I’m biased as a co-organizer) as well as logistics perspective (I could bike from home). This was the five year anniversary of the altmetrics manifesto so it was a great opportunity to reflect on the status of the community. Plus the conference organizers brought cake!

This was the first time that all of the authors were in the same room together and we got a chance to share some of our thoughts. The video is here if you want to hear us pontificate:

From my perspective, I think you can summarize the past years in two bullet points:

  • Amazing what the community has done: multiple startups on altmetrics, big companies having altmetric products, many articles and other research objects having altmetric scores, a small but vibrant research community is alive
  • It would be great to focus more on altmetrics to improve the research process rather than just their potential use in research evaluation.

Beyond the reflection on the community itself, I took three themes from the conference:

More & different data please

An interesting aspect is that most studies and implementations rely on social media data (twitter, mendeley, Facebook, blogs, etc). As an aside, it’s worth noting you can do amazing things with this data in a very short amount of time…

However, there is increasing interest in having data from other sources or having more contextualized data.

There were several good examples.  gave a good talk about trying to get data behind who tweets about scientific articles. I’m excited to see how better population data can help us have. The folks at are starting to provide data that looks at how articles are being used in public policy documents. Finally, moving beyond articles, Peter van Besselaar looking at data derived from grant review processes to study, for example, gender bias.

It’s also good to see developments such as the DOI Event Tracker that makes the aggregation of altmetrics data easier. This is hopefully just the start and we will see a continued expansion of the variety of data available for studies.

The role of theory

There was quite a bit of discussions about the appropriateness of the use of altmetrics for different tasks ranging from the development of global evaluation measures to their role in understanding the science system. There was a long discussion of the quality of altmetrics data in particular the transparency of how aggregator’s integrate and provide data.

A number of presenters discussed the need for theory in trying to interpret altmetrics signal. Cameron Neylon gave an excellent talk about his view of the need for a different theoretical view. There was also a break out session at the workshop discussing the role of theory and I look forward to the ether pad becoming something more well defined.  Peter van Bessellaar and I also tried to argue for a question driven approach when using altmetrics.

Finally, I enjoyed the work of Stefanie Haustein, Timothy Bowman, and Rodrigo Costas on interpreting the meaning of altmetrics. This is definitely a must read.

Going beyond research evaluation

I had a number of good conversations with people about the desire to do something that moves beyond the focus of research evaluation. In all honesty, being able or tell stories with a variety of metrics is probably why altmetrics has gained traction.

However, I think a world in which understanding the signals produced by the research system can be used to improve research is the exciting bit. There were some hints of this. In particular, I was compelled by the work of Kristi Holmes on using measures to improve translational medicine at northwestern.


Overall, It’s great to see all the great activity around altmetrics. There are a bunch of great summaries of the event. Check out the altmetrics conference blog and Julie Brikholz’s summary.

Last week (Jan 29 & 30), I was at the NSF & Sloan foundation workshop: Supporting Scientific Discovery through Norms and Practices for Software and Data Citation and Attribution. The workshop is in the context of the NSF’s dear colleague letter on the subject. The workshop brought together a range of backgrounds and organizations from Mozilla to NIH and NASA. I got to catch up with several friends but was able to meet some new folks as well. Check out the workshop’s github page with a list of 22 use cases submitted to the workshop.

I was pleased to see the impact impact of the work of FORCE11 on helping drive this space. In particular, the Joint Principles on Data Citation and Resource Identifiers (RRIDS) seem to be helping the community focus on citing other forms of scholarly output and were brought up several times in the meeting.

I think there were two main points from the conference:

  1. We have the infrastructure.
  2. Sustainability


It was clear that we have much of the infrastructure in-place to enable the citation and referencing of outputs such as software and data.

In terms of software, piggy backing off existing infrastructures seems to be the most likely approach. The versioning/release mindset built into software development means that hosting infrastructure such as Github or Google Code provide a strong start. These can then be integrated with existing scholarly attribution systems.My colleague Sweitze Roffel presented Elsevier’s work on Original Software Publications. This approach leverages the existing journal based ecosystem to provide the permanence and context associated with things in the scientific record. Another approach is to use the data hosting/citation infrastructure to give code a DOI e.g. by using Zenodo. Both approaches work with Github.

The biggest thing will be promoting the actual use of proper citations. James Howison of University of Texas Austin presented interesting deep dive results on how people refer to software in the scientific literature  (slide set below) (Githhub). It shows that people want to do this but often don’t know how. His study was focused I’d like to do this same study in an automatic fashion on the whole of the literature. I know he’s working with others on training machine learning models for finding software mentions so that would be quite cool. Maybe it would be possible to back-fill the software citation graph this way?

In terms of data citation, we are much farther along because many of the existing data repositories support the minting of data citations. Many of the questions asked were about cases with changing or mash-ups of data. These are impotent edge cases to look at. I think progress will be made here by leveraging the landing pages for data to provide additional metadata. Indeed, Joan Starr from the California Digital Library is going to bring this back to the DataCite working group to talk about how to enable this. I was also impressed with the PLOS lead Making Data Count project and Martin Fenner’s continued development of the Lagotto altmetrics platform. In particular there was discussion about getting a supplementary guideline for software and data downloads included in COUNTER. This would be a great step in getting data and citation properly counted.


Sustainability is one of the key questions that have been going around in the larger discussion. How do we fund software and data resources necessary for the community. I think the distinction that arose was the need to differentiate between:

  • software as an infrastructure; and
  • software as an experiment/method.

This seems rather obvious but the tendency is for the later to become the former and this causes issues in particular for sustainability.

Issues include:

  1. It’s difficult to identify which software will become key to the community and thus where to provide the investment.
  2. Scientific infrastructure software tends to be funded on project to project basis or sometimes as a sideline of a lab.
  3. Software that begins as an experiment is often not engineered correctly.
  4. As Luis Ibanez from Google pointed out, we often loose the original developers overtime and there’s a need to involve new contributors.

The Software Sustainability Institute in the UK has begun to tackle some of these problems. But there is still lack of clear avenues for aggregating the funding necessary. One of the popular models is the creation of a non-profit foundation to support a piece of software  but this leads to “foundation fatigue.” Others approaches shift the responsibility to university libraries, but libraries may not have the required organizational capabilities. Katherine Skinner’s recent talk at FORCE 2015 covered some of the same ground here.

One of the interesting ideas that came up at the workshop was the use of other parts of the University institution to help tap into different funding streams (e.g. the IPR office; university development office). An example of this is Internet2 which is sponsored directly by universities. However, as pointed out by Dan Katz, to support this sort of sustainability there is a need to have insight into the deeper impact of this sort of software for the scientific community.


You can see a summary of the outcomes here. In particular, take a look at the critical asks. These concrete requests were formulated by the workshop attendees to address some of the identified issues. I’ll be interested to see the report that comes out of the workshop and how that can help move us forward.

This past week I was at Academic Publishing in Europe 9 (APE 2014) for two days. I was invited to talk about altmetrics. This was kind of an update of the double act that I did with Mike Taylor from Elsevier Labs last year at another publishing conference, UKSG. You can find the slides of my talk from APE below along with a video of my presentation. Overall, the talk was well received:

I think for publishers the biggest thing is to recognize this as something they play a role in as well as emphasizing that altmetrics broaden the measurement space. It’s also interesting that authors want support for telling about their research – and need help.

Given that it was a publishing conference, it’s always interesting to see the themes getting talked about. Here are some  highlights from my perspective.

The Netherlands going gold

Open Access was as usual a discussion point. The Dutch State Secretary of Science was there, Sander Dekker, giving a full throated endorsement of gold open access. I thought the discussion by Michael Jubb on monitoring progress of the UK’s Open Access push after the Finich Report was interesting. I think seeing how the UK manages and measures this transition will be critical to understanding the ramifications of open access. However, I have a feeling that they may not be looking at the impact on faculty enough and in particular how money is distributed for open access gold pricing.

Big Data – It’s the variety!

There was a session on big data.  Technically, I thought I wouldn’t get a lot out of this session because with my computer science hat on I’ve heard quite a few technical talks on the subject. However, this session really confirmed to me not that were facing a problem with data processing or storage but data variety.

This was confirmed by the fantastic talk by Jason Swedlow on the Open Microscopy project. The project looks at how to manage and deal with massive amounts of image data and the interoperability of those images. (You can find one of the images that they published here – 281 gigapixels!) If your thinking about data integration or interoperability you should check out this project and his talk. I also like the notion that images as a measurement technique. He noted that their software deals with data size and processing but the difficulties were around the variety and just general dirtiness of all that data.

This emphasis on the issues of data variety as an issue was also emphasize by Simon Hodson  from CoDATA in his talk as he gave an overview of a number of e-science projects where data variety was the central issue.

Data / Other Stuff Citation

Data citation was another theme of the conference. As a community member, it was good to see mentioned frequently, in particular, the work on data citation principles that’s being facilitated by the community. Also, the resource identification initiative another FORCE11 community group – where researchers can identify specific resources (e.g. model organisms, software) in their publications in a machine readable way. This has already been endorsed by a number of journals (~25) and publishers. This ability to “cite” seems to be the central to how all these other scientific products are beginning to get woven into the scholarly literature. (See also

A good example of this was – Hans Pfeiffenberger talk on the Earth System Science Data journal – where they have created a journal specifically for data coming from large scale earth measurements. An interesting issue that came up was the need for bidirectional citation –  that is to publish the data and associated commentary at the same time each including references to each other using permanent identifiers with different publishers.

Digital Preservation

There was also some talk about preservation of content born online. Two things stood out for me here:

  1. Peter Burnhill‘s talk on and both projects to detect what content is being preserved. I was shocked to hear that only 20% of the online serials stored in a long term archives.
  2. This report seems pretty comprehensive on this front. Note to self – it will be good input for thinking about preserving linked data in the Prelinda project.

Science from the coffee shop

The conference had a session (dotcoms-to-watch) on startups in publishing. What caught me  was that we are really moving toward the idea that Ian Foster has been talking about, namely, science as a service.  With services like scrawl and science exchange, we’re starting to be able to even lab based experiments completely from your laptop. I think this is going to be huge. I already see this in computer science where myself and more of my colleagues turn to the Amazon cloud to boot up our test environments. Pretty soon you’ll be able to do your science just by calling an API.

Random Notes

My Slides & Talk


About a week ago, I launched The actual idea for ideacite is on figshare. This is a side-project that I did for a couple of reasons.

First, it’s always fun to build stuff so when I woke up in the morning and had this idea, I wanted to see how hard it was to just get it up running (especially given that there was a pun in the name).  It was interesting to see how fast on can get to an MVP.  I think it took me something like about 8 hours total to do the whole thing (a morning of so + a long friday night), and that was with thinking about which programming language to write it in 🙂 .

A bit about how it was built. I registered the domain name with, which now also provides a platform as a service option for node.js. It was pretty easy to just run an instance and point the domain there. I’m actually pretty impressed with the service and it’s affordable.  On the advice of Rinke Hoekstra, I ended up building the site entirely client side and just relying directly on the figshare api directly, which worked like a charm. The figshare api is nice enough not to have to do oauth for just using their search functionality and query simplifies doing all the DOM manipulation. Like everyone, I used twitter bootstrap to do the web design.

The second reason was that I’m interested in enabling attributable remixing in scholarship. We can obviously cite papers (i.e. research results), data citation is becoming more common, soon will be able to cite code, so why not ideas? With this combination and  some better tools, we can begin to experiment with faster ways of generating research results that build upon all the parts of the scholarly lifecycle while preserving a record of contribution. Or at least that’s the idea.

Anyway, there’s lots more to do with ideacite (e.g subscribing to a twitter account which broadcasts new ideas) but I’d love to hear your thoughts on where to go next.

Update: A full video of my talk on altmetrics has been posted. 

Altmetrics has seen an increasing interest as an alternative to traditional measures of academic performance. This past week I gave a talk in Amsterdam for Open Access week about how altmetrics can be used by academics and their organizations to highlight their broader set of contributions. These can be used to tell a richer and fuller story about how what we do has impact. The talk had a nice turn out of librarians, faculty and administrators (friendly faces below).

Audience for Altmetrics talk Open Access Week 2013

In relation to the talk, I was interviewed by the Dutch national newspaper, de Volkskrant, about the same theme (Twitter neemt wetenschap steeds meer de maat).


You can find the slides of the talk below. I’m told there will be video as well. A big thanks to the altmetrics community. The recent PLOS ALM workshop was a great resource for material. A big thanks goes to Cameron Neylon for allowing me to reuse some of his slides. Overall, I hope that I helped some more people understand how these new forms of metrics can help in showing the impact of what they do.


I was invited by to do a webinar for Elsevier Journal Editors on Altmetrics this past Tuesday. You can find the complete recording here.  260 people attended live. The best part probably was the Q/A session starting 33 minutes in. Broadly, I would broadly characterize the questions as: “I really would like to use these but can you give me some assurances that they are ok.”  Anyway, have a listen for yourself. Hannah Foreman did a great job of directing (she also brought me a cake for my birthday!). I also thought it was great to see Mike Taylor doing a demo of ImpactStory. I felt this was an important webinar to do as it reaches a traditional journal editor audience that may have not yet fully gotten on board. A final note, thanks to Steve Pettifier for letting me use him as an example.




For the past couple of days (April 8 – 10, 2013), I attended the UKSG conference. UKSG is organization for academic publishers and librarians. The conference itself has over 700 attendees and is focused on these two groups. I hadn’t heard of it until I was invited by Mike Taylor from Elsevier Labs to give a session with him on altmetrics.

The session was designed to both introduce and give a start-of-the art update on altmetrics to publishers and librarians. You can see what I had to say in the clip above but my main point was that altmetrics is at a stage where it can be advantageously used by scholars, projects and institutions not to rank but instead tell a story about their research. It’s particular important when many scientific artifacts beyond the article (e.g. data, posters, blog posts, videos) are becoming increasingly trackable and can help scholars tell their story.

The conference itself was really a bit weird for me as it was a completely different crowd than I normally would connect with… I had to one of the few “actual” academics there, which lead to my first day tweet:

It was fun to randomly go up to the ACM and IEEE stand and introduce myself not as a librarian or another publisher but as an actual member of their organizations. Overall, though people were quite receptive of my comments and were keen to get my views on what publishing and librarians could be doing to help me as a research out. I do have to say that it was a fairly well funded operation (there is money in academia somewhere)…. I came away with a lot of free t-shirts and USB sticks and I never have been to a conference that had bumper cars for the evening entertainment:

UKSG bumper cars

In addition to (hopefully) contributing to the conference, I learned some things myself. Here are some bullet points in no particular order:

  • Outrageous talk by @textfiles – the Archive Team is super important
  • I talked a lot to Geoffrey Bilder from CrossRef. Topics included but not limited to:
    • why and when indirection is important for permanence in url space
    • the need for a claims (i.e. nanopublications) database referencing ORCID
    • the need for consistent url policies on sites and a “living will” for sites of importance
    • when will scientist get back to being scientists and stop being marketers (is this statement true, false, in-between, or is it even a bad thing)
    • the coolness of
  • It’s clear that librarians are the publishers customers, academics are second. I think this particular indirection  badly impacts the market.
  • Academic content output is situated in a network – why do we de-link it all the time?
  • The open access puppy

  • It was interesting to see the business of academic publishing going done. There were lots of pretty intense looking dealings going down that I witnessed in the cafe.
  • Bournemouth looks like it could have some nice surfing conditions.

Overall, UKSG was a good experience to see, from the inside, this completely other part of the academic complex.

%d bloggers like this: