I’m pleased to announce that at the beginning of 2015 I’ll be joining Elsevier Labs as Disruptive Technology Director 1.

In the past several years, there has been an explosion of creativity with respect to both research communication and research infrastructure. Whether it’s new ways to think about the impact of research (e.g. altmetrics), the outsourcing of experiments (e.g. Science Exchange) or the impact of massive datasets on the creation of large scale models (e.g. Big Mechanism), this is an exciting space to be in.

I’ve been lucky to be part of teams2 that have been addressing the issues of research infrastructure and communication through novel computer science. At Elsevier Labs , I’ll continue to focus on this area in an environment with amazing data, resources, people and potential for impact. This ability to focus is one of the reasons I’ve decided to make the jump from academia.3 In my new position, I’ll probably be out and about even more talking and writing about this area.4

Finally, a word on open science. My view on open science is strongly shaped by Cameron Neylon’s articulation of the need to reduce friction in the science system.5 The removal of barriers is central to being able to do science better. I think there is a strong role for commercial organizations to facilitate this reduction in friction in an open environment.6 Indeed, the original role of publishers did just that. From my discussions with the Labs team and others at Elsevier, the organization is absolutely receptive to this view and is moving in this direction.7 My hope is that I can help Elsevier use its many strengths to support a better, more open, and frictionless science ecosystem.

  1. See Horace Deidu’s Disruption FAQ 
  2. e.g. Open PHACTS, Data2Semantics, SMS, Wings, PASOA 
  3. Lada Adamic does a much better job of summing up the reasons for leaving academia and discussing the trade-offs. Many of her points ring true to me. 
  4. I really like writing trip reports 
  5. See also Please Keep it Simple 
  6. In a completely other context, see this discussion of how DigitalOcean works with open source. 
  7. e.g. Mendeley, Research Data Services @ Elsevier 

This is my 100th blog post here at Think Links. I started blogging October 23, 2008 with a post about the name of the blog. That’s about 5 years of blogging averaging about 20 posts a year. So not a huge amount but consistent. This blog is what I would consider an academic blog or at least a work related one. As a forum of scholarly communication, I’ve found blogging to be a very beneficial. Here are 10 things that I like personally about the medium (yes, a listicle!):

  1. It provides a home for material that is useful but wouldn’t belong in a more formal setting. For example, comments on work practiceteaching or neat randomly related stuff.
  2. It’s quick. If I have something to note, I can just put it out there.
  3. The public nature forces me to make my own notes better. In particular, I’ve been doing trip reports, which have been really helpful in synthesizing my notes on various events. Even though most are not read the fact that they are public makes my writing more coherent.
  4. Embedding multimedia. It provides a way to aggregate a lot of different content into one place. Lately, I’ve been using the embed tweet feature to capture some of that conversation in context.
  5. Memories of the 5 paragraph essay. I had a very good history teacher in high school who drilled into us how to write 5 paragraph essays quickly. I find posts fairly easy to write because of this training. (I know there’s criticism of this style but I think the form helps to write.).
  6. It let’s me put another take on research papers that we’ve done in a more personal voice.
  7. A single searchable history. Reverse chronological order is helpful way to review what’s gone on. Furthermore, because it’s on the web you get all that fancy search stuff.
  8. Analytics are fun to look at. – altmetrics anyone?
  9. It’s part of the future of academic discourse…
  10. Links.

There’s more I’d like to do with this blog. Publishing directly from code. Personal videos. Interactive visualizations. Whether I do those things or not, having this space on the web in this format has been great for my own thinking and I hope for others as well. If you’re reading this, thanks and I hope you keep following.

The university where I work asks us to register all our publications for the year in a central database [1].  Doing this obviously made me think of doing an ego search on my academic papers. Plus, it’s the beginning of the year, which always seems like a good time to look at these things.

The handy tool Publish-or-Perish calculates all sorts of citation metrics based on a search of Google Scholar. The tool lets you pick the set of publications to consider. (For example, I left out all the publications from another Paul Groth who’s a professor of architecture at Berkeley.) I did a cursory run through to remove publications that weren’t mine but I didn’t spend much time so all the standard disclaimers apply. There may be duplicates, it includes technical reports, etc. For transparency, you can find the set of publications considered in the Excel file here. Also, it’s worth noting that the Google Scholar corpus has it’s own problems, in particular, it makes you look better. With all that in mind, let’s get to the fun stuff.

My stats as of Jan. 4, 2011 are:

  • Papers:93,
  • Citations:1318,
  • Years:12,
  • Cites/year:109.83,
  • Cites/paper:14.17/4.0/0,
  • Cites/author:416.35,
  • Papers/author:43.27,
  • Authors/paper:3.04/3.0/2,
  • h-index:21,
  • g-index:34,
  • hc-index:16,
  • hI-index:5.58,
  • hI-norm:11,
  • AWCR:224.17,
  • AW-index:14.97,
  • AWCRpA:70.96,
  • e-index:24.98,
  • hm-index:9.07,

You can find the definitions for these metrics here.

What does it all mean? I don’t know 🙂 I think it’s not half bad.

For comparison, here’s a list of  the h-indexes for top computer scientist computed using Google Scholar. All have  an h-index of 40 or greater. A quick scan through that least, shows that there’s a pretty strong correlation between being a top computer scientist and a high h-index. Thus, I conclude that I should continue concentrating on being a good computer scientists and the statistics will follow.

[1] I don’t know why my university doesn’t support importing publication information from bibtex, or RIS. Everything has to be added by hand, which takes a bit.

    One of the nice things about using cloud services is that sometimes you get a feature that you didn’t expect. Below is a nice set of stats from about how well Think Links did in 2010. I was actually quite happy with 12 posts – one post a month. I will be trying to increase the rate of posts this year. If you’ve been reading this blog, thanks! and have a great 2011. The stats are below:

    Here’s a high level summary of this blogs overall blog health:

    Healthy blog!

    The Blog-Health-o-Meter™ reads Fresher than ever.

    Crunchy numbers

    Featured image

    A Boeing 747-400 passenger jet can hold 416 passengers. This blog was viewed about 4,500 times in 2010. That’s about 11 full 747s.


    In 2010, there were 12 new posts, growing the total archive of this blog to 46 posts. There were 12 pictures uploaded, taking up a total of 5mb. That’s about a picture per month.

    The busiest day of the year was October 13th with 176 views. The most popular post that day was Data DJ realized….well at least version 0.1.

    Where did they come from?

    The top referring sites in 2010 were,,,, and

    Some visitors came searching, mostly for provenance open gov, think links, ready made food, 4store, and thinklinks.

    Attractions in 2010

    These are the posts and pages that got the most views in 2010.


    Data DJ realized….well at least version 0.1 October 2010


    4store Amazon Machine Image and Billion Triple Challenge Data Set October 2009


    Linking Slideshare Data June 2010


    A First EU Proposal April 2010


    Two Themes from WWW 2010 May 2010

    Think Links is both a blog and a project. As with all good projects, hopefully, this one will change and evolve over time. However, to start out I thought it would be good to put down the initial objectives and rough process of the project. A manifesto so to speak. As stated in the tagline for the blog, this is project is about provenance: the origins of stuff; where it came from, how it was produced, what are its component parts. In this age of globalization, its increasingly difficult to figure this out and its increasingly important. Provenance help us determine the quality of the stuff we use whether it is the vegetables we eat or the web pages we read. And to be honest, Made In China doesn’t help.

    In this context, the broad aims of Think Links are:

    1. to help me understand the role that provenance plays in a global society, in particular, its role in how people judge the stuff they use;
    2. to promote people’s awareness of the provenance of their stuff
    3. to document and devise new ways of communicating provenance

    To sum up, the aim is to think about the links back to the sources of our stuff.

      The process I plan to take is to be as open as serendipitous as I can be…. to have fun with it. I imagine that posts will range from interviews with people, to reviews of recent academic literature, to links to cool and effective design. I’m of the opinion that understanding the origins of things is a fundamental idea and thus, this project can range widely while still coalescing around the central aims above.  Hopefully, this project can do its small part to promote the production of quality by making people aware of provenance. 

      Finally, please feel free to comment, send me suggestions, or provide a guest post.

      I thought I’d begin this project with a simple post about the provenance of this blog’s name. When I was a kid, I used to play a game called Think Links!. The goal of the game, as far as I can remember, was to figure out the connections between the various cards presented. I always thought it was a cool name and it has lots of overloaded meaning (which I find fun). It also sums up the point of this blog, to think about and highlight the links between the products, services, and people, that tell us about the provenance (and hence the quality) of the stuff we use.


      think links game

      %d bloggers like this: