A couple of weeks ago, I was at the European Data Forum in Athens talking about the Open PHACTS project. You can find a video of my talk with slides here. Slides are embedded below.

One of my guilty pleasures is listening to mac-oriented tech podcasts. One that I listen to is the Accidental Tech Podcast, which features Marco Arment (of Tumblr and Instapaper fame), John Siracusa (Arstechnica and long mac reviews fame) and Casey Liss (…) . All three are programmers working on everything from  .Net consultancy to  iOS apps. As somebody who has spent my career as part computer science research/higher education, I find it interesting to hear what people in the software industry find useful from their education. So I sent the guys the following question:

I actually had a question related to the whole software methodology discussion. I’m a CS professor and I’m always curious what particular things that we teach turn out to be useful in the end. You had asked each other what one thing you would take from software methodology. My question is what are the one/two things from your CS education that you find the most useful when coding?

On their recent show (#56, The Woodpecker), they answered the question (starting 35:30 in). You can listen to their thoughtful answers. But I’ll try to summarize it. I heard 3 main points:

  1. Learning from the ground up. They talked about the importance of learning the entire stack from designing a chip on up. In particular, knowing operating systems, memory management (pointers!) and assembly language helps them make smarter decisions while programming. It’s not that you use these “low-level/behind the scenes” things often in practice but understanding them helps one make better choices at higher levels of abstraction.
  2. Dealing with diversity.  They pointed out how they learned to use multiple different pieces of technology during their degrees. Marco singled out what I would call a programming languages course. This is a course where you learn and program a little bit in all types of languages and learn about the concepts that underlie them (e.g. functional vs. imperative, pass-by-reference vs pass-by-value etc.). This means that learning a new language in the real world, whether its Objective-C or perl, is that much easier. In general, getting practice in picking up a new technology and applying it immediately to a problem was seen as helpful.
  3. Core concepts and principles. They noted that having learned core CS topics like data structures and algorithms and general CS principles was useful. It’s not that they are used everyday but “knowing what to look up on wikipedia” is useful. They also noted that in business there is less/no time to learn these core principles. Furthermore, it’s hard to learn them if you’re not forced to do so.

From my perspective, it’s nice to hear a response that fits with what I (and I think most CS professors) would say. We should be teaching core concepts and principles and letting students learn the whole stack of computing. The one thing I think I’ll probably take away from this for our own curriculum is maybe not to worry so much about consistency in programming languages across courses. Indeed, that may be a feature not a bug.

Anyway, if your interested in this sort of thing check out the podcast.

This is my 100th blog post here at Think Links. I started blogging October 23, 2008 with a post about the name of the blog. That’s about 5 years of blogging averaging about 20 posts a year. So not a huge amount but consistent. This blog is what I would consider an academic blog or at least a work related one. As a forum of scholarly communication, I’ve found blogging to be a very beneficial. Here are 10 things that I like personally about the medium (yes, a listicle!):

  1. It provides a home for material that is useful but wouldn’t belong in a more formal setting. For example, comments on work practiceteaching or neat randomly related stuff.
  2. It’s quick. If I have something to note, I can just put it out there.
  3. The public nature forces me to make my own notes better. In particular, I’ve been doing trip reports, which have been really helpful in synthesizing my notes on various events. Even though most are not read the fact that they are public makes my writing more coherent.
  4. Embedding multimedia. It provides a way to aggregate a lot of different content into one place. Lately, I’ve been using the embed tweet feature to capture some of that conversation in context.
  5. Memories of the 5 paragraph essay. I had a very good history teacher in high school who drilled into us how to write 5 paragraph essays quickly. I find posts fairly easy to write because of this training. (I know there’s criticism of this style but I think the form helps to write.).
  6. It let’s me put another take on research papers that we’ve done in a more personal voice.
  7. A single searchable history. Reverse chronological order is helpful way to review what’s gone on. Furthermore, because it’s on the web you get all that fancy search stuff.
  8. Analytics are fun to look at. - altmetrics anyone?
  9. It’s part of the future of academic discourse…
  10. Links.

There’s more I’d like to do with this blog. Publishing directly from code. Personal videos. Interactive visualizations. Whether I do those things or not, having this space on the web in this format has been great for my own thinking and I hope for others as well. If you’re reading this, thanks and I hope you keep following.

This past week I was at Academic Publishing in Europe 9 (APE 2014) for two days. I was invited to talk about altmetrics. This was kind of an update of the double act that I did with Mike Taylor from Elsevier Labs last year at another publishing conference, UKSG. You can find the slides of my talk from APE below along with a video of my presentation. Overall, the talk was well received:

I think for publishers the biggest thing is to recognize this as something they play a role in as well as emphasizing that altmetrics broaden the measurement space. It’s also interesting that authors want support for telling about their research – and need help.

Given that it was a publishing conference, it’s always interesting to see the themes getting talked about. Here are some  highlights from my perspective.

The Netherlands going gold

Open Access was as usual a discussion point. The Dutch State Secretary of Science was there, Sander Dekker, giving a full throated endorsement of gold open access. I thought the discussion by Michael Jubb on monitoring progress of the UK’s Open Access push after the Finich Report was interesting. I think seeing how the UK manages and measures this transition will be critical to understanding the ramifications of open access. However, I have a feeling that they may not be looking at the impact on faculty enough and in particular how money is distributed for open access gold pricing.

Big Data – It’s the variety!

There was a session on big data.  Technically, I thought I wouldn’t get a lot out of this session because with my computer science hat on I’ve heard quite a few technical talks on the subject. However, this session really confirmed to me not that were facing a problem with data processing or storage but data variety.

This was confirmed by the fantastic talk by Jason Swedlow on the Open Microscopy project. The project looks at how to manage and deal with massive amounts of image data and the interoperability of those images. (You can find one of the images that they published here – 281 gigapixels!) If your thinking about data integration or interoperability you should check out this project and his talk. I also like the notion that images as a measurement technique. He noted that their software deals with data size and processing but the difficulties were around the variety and just general dirtiness of all that data.

This emphasis on the issues of data variety as an issue was also emphasize by Simon Hodson  from CoDATA in his talk as he gave an overview of a number of e-science projects where data variety was the central issue.

Data / Other Stuff Citation

Data citation was another theme of the conference. As a community member, it was good to see force11.org mentioned frequently, in particular, the work on data citation principles that’s being facilitated by the community. Also, the resource identification initiative another FORCE11 community group – where researchers can identify specific resources (e.g. model organisms, software) in their publications in a machine readable way. This has already been endorsed by a number of journals (~25) and publishers. This ability to “cite” seems to be the central to how all these other scientific products are beginning to get woven into the scholarly literature. (See also ideacite.org)

A good example of this was – Hans Pfeiffenberger talk on the Earth System Science Data journal – where they have created a journal specifically for data coming from large scale earth measurements. An interesting issue that came up was the need for bidirectional citation –  that is to publish the data and associated commentary at the same time each including references to each other using permanent identifiers with different publishers.

Digital Preservation

There was also some talk about preservation of content born online. Two things stood out for me here:

  1. Peter Burnhill‘s talk on thekeepers.org and hiberlink.org both projects to detect what content is being preserved. I was shocked to hear that only 20% of the online serials stored in a long term archives.
  2. This report seems pretty comprehensive on this front. Note to self – it will be good input for thinking about preserving linked data in the Prelinda project.

Science from the coffee shop

The conference had a session (dotcoms-to-watch) on startups in publishing. What caught me  was that we are really moving toward the idea that Ian Foster has been talking about, namely, science as a service.  With services like scrawl and science exchange, we’re starting to be able to even lab based experiments completely from your laptop. I think this is going to be huge. I already see this in computer science where myself and more of my colleagues turn to the Amazon cloud to boot up our test environments. Pretty soon you’ll be able to do your science just by calling an API.

Random Notes

My Slides & Talk


About a week ago, I launched ideacite.org. The actual idea for ideacite is on figshare. This is a side-project that I did for a couple of reasons.

First, it’s always fun to build stuff so when I woke up in the morning and had this idea, I wanted to see how hard it was to just get it up running (especially given that there was a pun in the name).  It was interesting to see how fast on can get to an MVP.  I think it took me something like about 8 hours total to do the whole thing (a morning of so + a long friday night), and that was with thinking about which programming language to write it in :-) .

A bit about how it was built. I registered the domain name with gandi.net, which now also provides a platform as a service option for node.js. It was pretty easy to just run an instance and point the domain there. I’m actually pretty impressed with the service and it’s affordable.  On the advice of Rinke Hoekstra, I ended up building the site entirely client side and just relying directly on the figshare api directly, which worked like a charm. The figshare api is nice enough not to have to do oauth for just using their search functionality and query simplifies doing all the DOM manipulation. Like everyone, I used twitter bootstrap to do the web design.

The second reason was that I’m interested in enabling attributable remixing in scholarship. We can obviously cite papers (i.e. research results), data citation is becoming more common, soon will be able to cite code, so why not ideas? With this combination and  some better tools, we can begin to experiment with faster ways of generating research results that build upon all the parts of the scholarly lifecycle while preserving a record of contribution. Or at least that’s the idea.

Anyway, there’s lots more to do with ideacite (e.g subscribing to a twitter account which broadcasts new ideas) but I’d love to hear your thoughts on where to go next.

The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 8,500 times in 2013. If it were a concert at Sydney Opera House, it would take about 3 sold-out performances for that many people to see it.

Click here to see the complete report.


Get every new post delivered to your Inbox.

Join 26 other followers

%d bloggers like this: