Monthly Archives: June 2010

One of the things I’ve been wondering for a while now is how easy it is to develop end-users applications that take advantage of provenance. Is the software infrastructure there, do we have appropriate interface components, are things fast enough? To test this out, we held a hackathon at the International Provenance and Annotation Workshop (IPAW 2010).

The hackathon had three core objectives:

  1. Come up with a series of end user application ideas
  2. Develop cool apps
  3. And understand where we are at in terms of enabling app development

Another thing I was hoping to do was to get people from different groups to collaborate together. So how did it turn out?

We had 18 participants who divided up into the following teams:

  • Team Electric Bill
    • Paulo Pinheiro da Silva (UTEP)
    • Timothy Lebo (RPI)
    • Eric Stephan (UTEP)
    • Leonard Salayandia (RPI)
  • Team GEXP
    • Vitor Silva (Universidade Federal do Rio de Janeiro)
    • Eduardo Ogasawara (Universidade Federal do Rio de Janeiro)
  • Team Social Provenance
    • Aida Gandara (UTEP)
    • Alvaro Graves (RPI)
    • Evan Patton (UTEP)
  • Team MID
    • Iman Naja (University of Southampton)
    • Markus Kunde (DLR)
    • David Koop (University of Utah)
  • Team TheCollaborators
    • Jun Zhao (Oxford)
    • Alek Slominski (Indiana University)
    • Paolo Missier (University of Manchester)
  • Team Crowd Wisdom
    • James Michaelis (RPI)
    • Lynda Niemeyer (AMSEC, LLC)
  • Team Science
    • Elaine Angelino (Harvard)

From these teams, we had a variety of great ideas:

  • Team Electric Bill – Understanding energy consumption in the home
  • Team GEXP – Create association between abstract experiments and workflow trials
  • Team SocialProvenance –  Track the provenance of tweets on twitter
  • Team MID – Add geographic details to provenance
  • Team TheCollaborators – Research paper that embeds the provenance of and artifact
  • Team CrowdWisdom – Use provenance to filter the information from crowd sourced websites
  • Team Science – Find the impact of a change in a script on the other parts of the script

Obviously, to implement these ideas completely would take quite a while but amazingly these teams got quite far. For example, Team SocialProvenance was able to recreate twitter conversations for a number of hashtag topics including the world cup and ipaw in Proof Markup Language. Here’s a screenshot:

Here’s another screen shot from Team MID, showing how you can navigate through an Open Provenance Model graph with geo annotations:

Geo Provenance Mashup

Geo Provenance Mashup from Team MID

Those are just two examples, the other teams got quite far as well given that we ended at 4pm.

So where are we at. We had a brief conversation at the end of the hackathon (also I received a number of emails) about whether we were at a place where we could hack provenance end-user apps. The broad conclusions were as follows:

  • The maturity of tools is not there yet especially for semantic web apps. The libraries aren’t reliable and lack documentation.
  • Time was spent generating provenance not necessarily using it.
  • It would be good to have guidelines for how to enhance applications with provenance. What’s the boundary between provenance and application data?
  • It would be nice to have a common representation of provenance to work on. (Go W3C incubator!)

You can find some more thoughts about this from Tim Lebo, here. As for the hackathon itself, the participants were really enthusiastic and several said that they would continue building on the ideas they developed in the hackathon itself.

Hackathon Winners

Myself, Jim Myers (NCSA), Luc Moreau (IPAW PC co-chair) judged the apps and came up with what we thought the top three apps. Our judging criteria were: whether the app was aimed at the end user, whether it worked, whether provenance was required, and coolness factor.  We will announce the winners tomorrow at the closing session of IPAW. The winners will receive some great prizes sponsored by the Large Knowledge Collider Project (LarKC). LarKC sponsored this hackathon because provenance is becoming a crucial part of semantic web applications. the hackathon let LarKC see how they can ensure that their platform  can support hackers in building great provenance-enabled semantic web apps.


I was impressed with all the participants and the apps that were produced. We are a fairly new research community so to see what could be built in so little time is great. We are getting there and I can imagine that very soon we will have the infrastructure necessary to build provenance user-facing apps fast.

I love It’s great after an event or conference to be able to go back and go through slides to jog my memory or to be able to get a bit of glimpse of a talk that I didn’t get to attend. Plus, I like the fact that it makes it easy to embed slides into a blog post like this one. Moreover, slides provide a concrete artifact artifact of an event as well as the outcome of many processes.

Thinking about that I thought it would be good to easily mash-up slide metadata with other Linked Data. Luckily, provides a nice API, which I’ve wrapped to output RDF. That means that we can retrieve and point to with URLs both slides and their associated metadata. You can find the service at:


Using the service, you can take a slideshow like this one:

and get RDF like this out:

<?xml version="1.0" encoding="UTF-8"?>
  <rdf:Description rdf:about="">
    <rdf:type rdf:resource=""/>
  <rdf:Description rdf:about="">
    <dc:subject>presentation zen</dc:subject>
    <sioc:has_creator rdf:resource=""/>
    <dcterms:created rdf:datatype="">2009-03-01T18:23:07-06:00</dcterms:created>
    <sioc:content>An overview of key take-aways from author Garr Reynold&amp;rsquo;s book Presentation Zen. </sioc:content>
    <dcterms:title>Presentation Zen</dcterms:title>
    <sioc:related_to rdf:resource=""/>
    <sioc:related_to rdf:resource=""/>
    <sioc:related_to rdf:resource=""/>
    <sioc:related_to rdf:resource=""/>
    <sioc:related_to rdf:resource=""/>
    <sioc:related_to rdf:resource=""/>
    <sioc:related_to rdf:resource=""/>
    <sioc:related_to rdf:resource=""/>
    <sioc:related_to rdf:resource=""/>
    <sioc:related_to rdf:resource=""/>
    <rdf:type rdf:resource=""/>

You do this by appending the path of the slideshow in slideshare to the end of e.g.

A Note on Usage limits queries to their API to 1000 a day. Update: informed me that they have removed their api limits, which is definitely cool. I will still leave in the support for using your own api keys. We recommend that if you want to use our service you get your own slideshare  API key. More details on how to use your slideshare api key with our service is on the page (

I actually think this is an interesting pattern for linked data services that is providing an open api but providing limits on it and then requiring api keys. It allows us to improve the RDF, adjust to the underlying api, but still abide by the terms of service from the underlying data source.

What Next?

First, this is a new service (and I don’t usually run services) so we’ll see how it holds up. Let us know if you discover problems. We have some ideas for what to do next, mainly, linking out to other data sources. But we’d love your ideas. How should we extend the service? Should we extend it? Let us know in the comments.

We had our second Dutch Semantic Web Meetup for 2010 yesterday. This was a smaller and more impromptu event than our last meetup. Many of our colleagues were enjoying the sun in Crete at ESWC. But Jan Aasman from Franz, Inc was in town and we thought it was a good chance to get everyone together to talk semantic web. We had 20 attendees with both new and old attendees.

Jan gave an interesting keynote discussing the internals of AllegroGraph (note, ssd drives really improve performance) as well as its features. I think what resonated most with the audience was the various demos and use cases Jan gave. He gave examples ranging from pharma to integrating information about the environmental impact of the lumber trade in Canada. One of the demos that  Marco Roos (a biologist from Leiden Medical Center) and a number of others thought was compelling,  showed the integration of LinkedClinicalTrials data with a number of other Linking Open Data sets (Diseasome, DrugBank). Jan showed how one could navigate between clinical trials via common diseases, symptoms, etc.

The two take aways were that semantics really allows for integrated data analytics and that were getting close to triple store parity to classic relational databases. Triple stores that can handle a trillion triples are coming this year…

After the talk, we headed to the VU Unviersity’s campus cafe and sat outside in the sun (yes, Amsterdam was emulating Crete). From talking to the various attendees, I think some important connections were made. With this smaller size event, it’s easier to get in-depth into conversations.

Given the short notice for this event, I was really impressed with the turnout. The Dutch semantic web community is clearly strong and we hope to continue organizing these events on a regular basis.

Finally, thanks to Christophe Guéret for organizing the event.

%d bloggers like this: