Monthly Archives: January 2014

This past week I was at Academic Publishing in Europe 9 (APE 2014) for two days. I was invited to talk about altmetrics. This was kind of an update of the double act that I did with Mike Taylor from Elsevier Labs last year at another publishing conference, UKSG. You can find the slides of my talk from APE below along with a video of my presentation. Overall, the talk was well received:

I think for publishers the biggest thing is to recognize this as something they play a role in as well as emphasizing that altmetrics broaden the measurement space. It’s also interesting that authors want support for telling about their research – and need help.

Given that it was a publishing conference, it’s always interesting to see the themes getting talked about. Here are some  highlights from my perspective.

The Netherlands going gold

Open Access was as usual a discussion point. The Dutch State Secretary of Science was there, Sander Dekker, giving a full throated endorsement of gold open access. I thought the discussion by Michael Jubb on monitoring progress of the UK’s Open Access push after the Finich Report was interesting. I think seeing how the UK manages and measures this transition will be critical to understanding the ramifications of open access. However, I have a feeling that they may not be looking at the impact on faculty enough and in particular how money is distributed for open access gold pricing.

Big Data – It’s the variety!

There was a session on big data.  Technically, I thought I wouldn’t get a lot out of this session because with my computer science hat on I’ve heard quite a few technical talks on the subject. However, this session really confirmed to me not that were facing a problem with data processing or storage but data variety.

This was confirmed by the fantastic talk by Jason Swedlow on the Open Microscopy project. The project looks at how to manage and deal with massive amounts of image data and the interoperability of those images. (You can find one of the images that they published here – 281 gigapixels!) If your thinking about data integration or interoperability you should check out this project and his talk. I also like the notion that images as a measurement technique. He noted that their software deals with data size and processing but the difficulties were around the variety and just general dirtiness of all that data.

This emphasis on the issues of data variety as an issue was also emphasize by Simon Hodson  from CoDATA in his talk as he gave an overview of a number of e-science projects where data variety was the central issue.

Data / Other Stuff Citation

Data citation was another theme of the conference. As a community member, it was good to see mentioned frequently, in particular, the work on data citation principles that’s being facilitated by the community. Also, the resource identification initiative another FORCE11 community group – where researchers can identify specific resources (e.g. model organisms, software) in their publications in a machine readable way. This has already been endorsed by a number of journals (~25) and publishers. This ability to “cite” seems to be the central to how all these other scientific products are beginning to get woven into the scholarly literature. (See also

A good example of this was – Hans Pfeiffenberger talk on the Earth System Science Data journal – where they have created a journal specifically for data coming from large scale earth measurements. An interesting issue that came up was the need for bidirectional citation –  that is to publish the data and associated commentary at the same time each including references to each other using permanent identifiers with different publishers.

Digital Preservation

There was also some talk about preservation of content born online. Two things stood out for me here:

  1. Peter Burnhill‘s talk on and both projects to detect what content is being preserved. I was shocked to hear that only 20% of the online serials stored in a long term archives.
  2. This report seems pretty comprehensive on this front. Note to self – it will be good input for thinking about preserving linked data in the Prelinda project.

Science from the coffee shop

The conference had a session (dotcoms-to-watch) on startups in publishing. What caught me  was that we are really moving toward the idea that Ian Foster has been talking about, namely, science as a service.  With services like scrawl and science exchange, we’re starting to be able to even lab based experiments completely from your laptop. I think this is going to be huge. I already see this in computer science where myself and more of my colleagues turn to the Amazon cloud to boot up our test environments. Pretty soon you’ll be able to do your science just by calling an API.

Random Notes

My Slides & Talk


About a week ago, I launched The actual idea for ideacite is on figshare. This is a side-project that I did for a couple of reasons.

First, it’s always fun to build stuff so when I woke up in the morning and had this idea, I wanted to see how hard it was to just get it up running (especially given that there was a pun in the name).  It was interesting to see how fast on can get to an MVP.  I think it took me something like about 8 hours total to do the whole thing (a morning of so + a long friday night), and that was with thinking about which programming language to write it in 🙂 .

A bit about how it was built. I registered the domain name with, which now also provides a platform as a service option for node.js. It was pretty easy to just run an instance and point the domain there. I’m actually pretty impressed with the service and it’s affordable.  On the advice of Rinke Hoekstra, I ended up building the site entirely client side and just relying directly on the figshare api directly, which worked like a charm. The figshare api is nice enough not to have to do oauth for just using their search functionality and query simplifies doing all the DOM manipulation. Like everyone, I used twitter bootstrap to do the web design.

The second reason was that I’m interested in enabling attributable remixing in scholarship. We can obviously cite papers (i.e. research results), data citation is becoming more common, soon will be able to cite code, so why not ideas? With this combination and  some better tools, we can begin to experiment with faster ways of generating research results that build upon all the parts of the scholarly lifecycle while preserving a record of contribution. Or at least that’s the idea.

Anyway, there’s lots more to do with ideacite (e.g subscribing to a twitter account which broadcasts new ideas) but I’d love to hear your thoughts on where to go next.

%d bloggers like this: