Tag Archives: #ipaw2010

This year I had the opportunity to be program co-chair and help organize the 4th International Provenance and Annotation Workskhop (IPAW). The event went great, really, better than I imagined. First, I was fortunate to be organizing it with James Frew from the Bren School of Environmental Science and Management a the University of California, Santa Barabara. He not only helped coordinate the program but along with his team took care of all the local organization. It’s hard to beat sunny Santa Barbara as a location, but they also made sure that everything ran smoothly: great wifi, shuttles to and from the location, tasty outdoor lunches looking over the ocean, an open air poster session with wine and cheese, and a BBQ on the beach for a workshop dinner:

The IPAW workshop dinner. Photo from Andreas Schrieber.

So big kudos to Frew and his team. Obviously, beyond being well run we covered a lot in the two days of the main workshop. The workshop had 47 attendees and you can find the twitter log here.

Research Highlights

I think the program really highlighted where we are at in provenance research today and the directions forward. I won’t go through every paper but  just try to pick 3 interesting trends.

1) Using Provenance to Address the Messiness of the Web

The Web provides us a fantastic source of knowledge. But the problem is that knowledge is completely unclean and unintegrated. Even efforts such as Linked Data while giving us better data our still messy and still under-integrated. Both researchers and firms have been trying to make clean integrated  knowledge, but then they are faced with what Timothy Lebo in his paper termed the Integrator’s Dilemma. A integrator may produce a clean well-structured data set, but in the process the resulting data set looses authority and a connection to domain expertise. To rectify this problem, provenance can be used to identify the authoritative source and connect back to domain expertise. Indeed, Jim McCusker and colleagues argued that provenance is the 3rd step to Data Sanity.

However, then we run into Tim’s 1st law of producing provenance:

For any provenance record, there exists a provenance consumer that will need it, but will not like how it is provided.

Tim suggests a service based solution to provide provenance at the correct granularity for provenance. While I don’t know if that is the right solution, it’s clear that providing provenance at the right level of granularity is one foundation to building confidence in integrated web data sources.

Another example of trying different ways of using provenance to address messiness is the use of network analysis to understand provenance captured from crowd sourced applications.

2) Provenance and Credit are Intertwined

Science has always been a driver of research in provenance and we saw a number of good pieces of work addressing domains ranging from climate analysis to archeology. However, as our key note speaker Phil Bourne pointed out in his talk, scientists are not using provenance technologies in their work. He argued that this is for two reasons: 1) because they are not given credit for all parts of the scientific process and 2) provenance infrastructure is still not easy enough to use.  Phil argued that it was fundamental that more artifacts of the research lifecycle need to be given credit to facilitate sharing and thus increase the pace of innovation particularly in the life sciences. Thus, for scientists to capture their information in a sharable fashion they need to be given credit for doing so. (Yes, he connected altmetrics to provenance – very cool from my point of view). To do this, he argued, we need better support for provenance throughout the research lifecycle. However, while tools exist, they are far from being usable and integrated enough into everyday science practice. This is a real challenge to the provenance community. We need to do better at getting our approaches into scientists hands.

3) The Problem of Post-hoc

Much work in the provenance literature has asked the question of how does one capture provenance effectively in computational systems. But many times this is just not possible. The user may not have thought about installing the system to capture provenance in the first place or may not have chosen to write down their rational for taking some action. This is an area that I’m actively researching so it was a great to see others starting to address the problem. Tom De Neiss attacked the problem of reconstructing provenance for a collection of newspaper articles using semantic similarity. An even more farther out idea presented at the workshop was to try and reconstruct the provenance of decision made by a human using simulation. Both works highlight the need for dealing with incomplete or even non-existant provenance.

These were just some of the themes that I saw. Overall, the presentations were good and the audience was engaged. We had lots of hall time and I heard many intense discussions so I’m hoping that the event spurred more research. I know personally we will try to pursue a collaboration to build a provenance corpus to study this reconstruction problem.

A Provenance Week

IPAW has a tradition of being hosted as an independent event, which allows us to not only have the two day workshop but also organize collocated events. This IPAW was the same. The Data Observation Network for Earth organized a meeting on provenance and scientific workflow collocated with IPAW. Additionally, the W3C Provenance Working Group both gave a tutorial before the workshop and held their two day face-to-face meeting afterwards. Here’s me presenting the core of the provenance data model to the 28 tutorial participants.

The Provenance Data Model. It’s easy! Photo prov:wasAttributedTo Andreas Schrieber


IPAW 2012 was a lot effort but it was worth it – fun discussion, beautiful weather and research insight.  Again, the community voted to have another IPAW in 2014. The community is continuing to play to its strengths in workflows, databases and science applications while exploring novel areas. In the CFP for IPAW, we wrote that “2012 will be a watershed year for provenance/annotation research.” For me, IPAW confirmed that statement.

One of the things I’ve been wondering for a while now is how easy it is to develop end-users applications that take advantage of provenance. Is the software infrastructure there, do we have appropriate interface components, are things fast enough? To test this out, we held a hackathon at the International Provenance and Annotation Workshop (IPAW 2010).

The hackathon had three core objectives:

  1. Come up with a series of end user application ideas
  2. Develop cool apps
  3. And understand where we are at in terms of enabling app development

Another thing I was hoping to do was to get people from different groups to collaborate together. So how did it turn out?

We had 18 participants who divided up into the following teams:

  • Team Electric Bill
    • Paulo Pinheiro da Silva (UTEP)
    • Timothy Lebo (RPI)
    • Eric Stephan (UTEP)
    • Leonard Salayandia (RPI)
  • Team GEXP
    • Vitor Silva (Universidade Federal do Rio de Janeiro)
    • Eduardo Ogasawara (Universidade Federal do Rio de Janeiro)
  • Team Social Provenance
    • Aida Gandara (UTEP)
    • Alvaro Graves (RPI)
    • Evan Patton (UTEP)
  • Team MID
    • Iman Naja (University of Southampton)
    • Markus Kunde (DLR)
    • David Koop (University of Utah)
  • Team TheCollaborators
    • Jun Zhao (Oxford)
    • Alek Slominski (Indiana University)
    • Paolo Missier (University of Manchester)
  • Team Crowd Wisdom
    • James Michaelis (RPI)
    • Lynda Niemeyer (AMSEC, LLC)
  • Team Science
    • Elaine Angelino (Harvard)

From these teams, we had a variety of great ideas:

  • Team Electric Bill – Understanding energy consumption in the home
  • Team GEXP – Create association between abstract experiments and workflow trials
  • Team SocialProvenance –  Track the provenance of tweets on twitter
  • Team MID – Add geographic details to provenance
  • Team TheCollaborators – Research paper that embeds the provenance of and artifact
  • Team CrowdWisdom – Use provenance to filter the information from crowd sourced websites
  • Team Science – Find the impact of a change in a script on the other parts of the script

Obviously, to implement these ideas completely would take quite a while but amazingly these teams got quite far. For example, Team SocialProvenance was able to recreate twitter conversations for a number of hashtag topics including the world cup and ipaw in Proof Markup Language. Here’s a screenshot:

Here’s another screen shot from Team MID, showing how you can navigate through an Open Provenance Model graph with geo annotations:

Geo Provenance Mashup

Geo Provenance Mashup from Team MID

Those are just two examples, the other teams got quite far as well given that we ended at 4pm.

So where are we at. We had a brief conversation at the end of the hackathon (also I received a number of emails) about whether we were at a place where we could hack provenance end-user apps. The broad conclusions were as follows:

  • The maturity of tools is not there yet especially for semantic web apps. The libraries aren’t reliable and lack documentation.
  • Time was spent generating provenance not necessarily using it.
  • It would be good to have guidelines for how to enhance applications with provenance. What’s the boundary between provenance and application data?
  • It would be nice to have a common representation of provenance to work on. (Go W3C incubator!)

You can find some more thoughts about this from Tim Lebo, here. As for the hackathon itself, the participants were really enthusiastic and several said that they would continue building on the ideas they developed in the hackathon itself.

Hackathon Winners

Myself, Jim Myers (NCSA), Luc Moreau (IPAW PC co-chair) judged the apps and came up with what we thought the top three apps. Our judging criteria were: whether the app was aimed at the end user, whether it worked, whether provenance was required, and coolness factor.  We will announce the winners tomorrow at the closing session of IPAW. The winners will receive some great prizes sponsored by the Large Knowledge Collider Project (LarKC). LarKC sponsored this hackathon because provenance is becoming a crucial part of semantic web applications. the hackathon let LarKC see how they can ensure that their platform  can support hackers in building great provenance-enabled semantic web apps.


I was impressed with all the participants and the apps that were produced. We are a fairly new research community so to see what could be built in so little time is great. We are getting there and I can imagine that very soon we will have the infrastructure necessary to build provenance user-facing apps fast.

%d bloggers like this: