Archive

events

It’s been about a week since I got from Australia attending the International Semantic Web Conference  (ISWC 2013).  This is the premier forum for the latest in research on using semantics on the Web. Overall, it was a great conference – both well run and there was a good buzz. (Note, I’m probably a bit biased – I was  chair of this year’s In-Use track) .

ISWC is a fairly hard conference to get into and the quality is strong.

More importantly, almost all the talks I went to were worth thinking about. You can find the proceedings of the conference online either as a complete zip here or published by Springer. You can find more stats on the conference here.

As an aside, before digging into the meat of the conference – Sydney was great. Really a fantastic city – very cosmopolitan and with great coffee. I suggest Single Origin Roasters.  Also, Australia has wombats – wombats are like the chillest animal ever.

Wombat

From my perspective, there were three main themes to take away from the conference:

  1. Impressive applications of semantic web technologies
  2. Core ontologies as the framework for connecting complex integration and retrieval tasks
  3. Starting to come to grips with messiness

Applications

We are really seeing how semantic technologies can power great applications. All three keynotes highlighted the use of Semantic Tech. I think Ramanathan Guha’s keynote probably highlighted this the best in his discussion of the growth of schema.org.

Beyond the slide above, he brought up representatives from Yandex, Yahoo, and Microsoft on stage to join Google to tell how they are using schema.org. Drupal and WordPress will have schema.org in their cores in 2014. Schema.org is being used to drive everything from veteran friendly job search, to rich pins on Pinterest and enabling Open Table reservations to be easily put into your calendar. So schema.org is clearly a success.

Peter Mika presented a paper on how Yahoo is using ontologies to drive entity recommendations in searches. For example, you search for Brad Pitt and they show you related entities like Angelina Jolie or  Fight Club. The nice thing about the paper is that it showed how the deployment in production (in Yahoo! Web Search in the US) increases click through rates.

Roi Blanco, Berkant Barla Cambazoglu, Peter Mika, Nicolas Torzec: Entity Recommendations in Web Search. International Semantic Web Conference (2) 2013: 33-48

I think it was probably Yves Raimond’s conference – he showed some amazing things being done at the BBC using semantic web technology. He had an excellent keynote at the COLD workshop – also highlighting some challenges on where we need to improve to ease the use of these technologies in production. I recommend you check out the slides above. Of all the applications, their work on mining the world service archive  of the BBC to enrich content being created. This work won the Semantic Web Challenge.

In the biomedical domain, there were two  papers showing how semantics can be embedded in tools that regular users use.  One showed how the development of ICD-11 (ICD is the most widely used clinical classification developed by the WHO) is  supported using semtech. The other I liked was the use of excel templates (developed using RightField) that transparently captured data according to a domain model for Systems biology.

Also in the biomedical domain, IBM presented an approach for using semantic web technologies to help coordinate health and social care at the semantic web challenge.

Finally, there was a neat application presented by Jane Hunter applying these technologies to art preservation: The Twentieth Century in Paint.

I did a review of all the in-use papers leading up to the conference but it’s good enough to say that there were numerous impressive applications. Also, I think it says something about the health of the community when you see slides like this:

Core Ontologies + Other Methods

There were a number of interesting papers that were around the idea of using a combination of well-known ontologies and then either record linkage or other machine learning methods to populate knowledge bases.

A paper that I like a lot (and also won the best student paper) was titled Knowledge Graph Identification (by Jay Pujara, Hui Mia, Lise Getoor and William Cohen) sums it up nicely:

Our approach, knowledge graph identification (KGI) combines the tasks of entity resolution, collective classification and link prediction mediated by rules based on ontological information.

Interesting papers under this theme were:

From my perspective, it was also nice to see the use of the W3C Provenance Model (PROV) as one of these core ontologies in many different papers and two of the keynotes. People are using it as a substructure to do a number of different applications – I intend to write a whole post on this – but until then here’s proof by twitter:

Coming to grips with messiness

It’s pretty evident that when dealing with the web things are messy. There were a couple of papers that documented this empirically either in terms of the availability of endpoints or just looking at the heterogeneity of the markup available from web pages.

In some sense, the papers mentioned in the prior theme also try to deal with this messiness. Here are another couple of papers looking at essentially how do deal with or even use this messiness.

One thing that seemed a lot more present in this year’s conference than last year  was the term entity. This is obviously popular because of things like google knowledge graph – but in some sense maybe it gives a better description of what we are aiming to get out of the data we have – machine readable descriptions or real world concepts/things.

Misc.

There are some things that are of interest that don’t fit neatly into the themes above. So I’ll just try a bulleted list.

  • We won the Best Demo Paper Award for git2prov.org
  • Our paper on using NoSQL stores for RDF went over very well. Congrats to Marcin for giving a good presentation.
  • The format of mixing talks from different tracks by topic and having only 20 minutes per talk was great.
  • VUA had a great showing – 3 main track papers, a bunch of workshop papers, a couple of different posters, 4 workshop organizers giving talks at the workshop summary session, 2 organizing committee members, alumni all over the place, plus a bunch of stuff I probably forgot to mention.
  • The colocation with Web Directions South was great – it added a nice extra energy to the conference.
  • There were best reviewer awards won by Oscar Corcho, Tania Tudorache, and Aidan Hogan
  • Peter Fox seemed to give a keynote just for me – concept maps, PROV followed with abductive reasoning.
  • Did I mention that the coffee in Sydney (and Newcastle) is really good and lots of places serve proper breakfast!

I think since I’ve moved to Europe I’ve been attending ESWC (Extended/European Semantic Web Conference) and I always get something out of the event. There are plenty of familiar faces but also quite a few new people and it’s a great environment for having chats. In addition, the quality of the content is always quite good. This year the event was held in Montpellier and was for the most part well organized: the main conference wifi worked!

The stats:

  • 300 participants
  • 42 accepted papers from 162 submissions
  • 26% acceptance rate
  • 11 workshops + 7 tutorials

So what was I doing there:

The VU Semantic Web group also had a strong showing:

  • Albert Meroño-Peñuela won the best PhD symposium paper for his work on digital humanities and the semantic web.
  • The USEWOD workshop’s (led by Laura Hollink) datasets were used by a number of main track papers for evaluation.
  • Stefan Schlobach and Laura Hollink were on the organizing committee. And we organized a couple of workshops & tutorials.
  • Posters/Demos:
    • Albert Meroño-Peñuela, Rinke Hoekstra, Andrea Scharnhorst, Christophe Guéret and Ashkan Ashkpour. Longitudinal Queries over Linked Census Data.
    • Niels Ockeloen, Victor de Boer and Lora Aroyo. LDtogo: A Data Querying and Mapping Framework for Linked Data Applications.
  • Several workshop papers.

I’ll try to pull out what I thought were the highlights of the event.

What is a semantic web application?

Can you escape Frank?

Can you escape Frank?

The keynotes from Enrico Motta and David Karger focused on trying to define what a semantic web application was. This starts out in the form of does a Semantic Web application need to use the Semantic Web set of standards (e.g. RDF, OWL, etc). So from my perspective, the answer is no. These standards are great infrastructure for building these applications but are they necessary, no (see google knowledge graph).  Then what is a semantic web application?

From what I could gather, Motta would define it as an application that is scalable, uses the web and embraces Model Theoretic semantics. For me that’s rather limiting, there are many other semantics that may be appropriate… we can ground meaning in something else other than model theory. I think a good example of this is the work on Pragmatic Semantics that my colleague Stefan Schlobach presented at the Artificial Intelligence meets the Semantic Web workshop. Or we can reach back into AI and see discussion’s from Brooks’ classic paper Elephant’s Don’t Play Chess.  I felt that Karger’s definition (in what was a great keynote) was getting somewhere. He defined a semantic web application essentially as:

An application whose schema is expected to change.

This seems to me to capture the semantic portion of the definition, in a sense that the semantics need to be understood on the fly. However, I think we need to role the web back into this definition… Overall, I thought this discussion was worth having and helps the field define what it is that we are aiming at. To be continued…

Homebrew databases

2013-05-29 09.18.05

Homebrew databases

As I said, I thought Karger’s keynote was great. He gave a talk within a talk, on the subject of homebrew databases from this paper in CHI 2011:

Amy Voida, Ellie Harmon, and Ban Al-Ani. 2011. Homebrew databases: complexities of everyday information management in nonprofit organizations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11). ACM, New York, NY, USA, 915-924. DOI=10.1145/1978942.1979078 http://doi.acm.org/10.1145/1978942.1979078

They define a homebrew database as “an assemblage of information management resources that people have pieced together to satisfice their information management needs.” This is just what we see all the time, the combination of excel, word, email, databases and don’t forget normal paper brought together to try to attack information management problems. A number of our use cases from the pharma industry as well as science reflect essentially this practice. It’s great to see a good definition of this problem grounded in ethnographic studies.

The Concerns of Linking

There were a couple of good papers on generating linkage across datasets (the central point of linked data). In Open PHACTS, we’ve been dealing with the notion of essentially context dependent linkages. I think this notion is becoming more prevalent in the community. We had a lot of positive response on this in the poster session when presenting Open PHACTS. Probably, my favorite paper was on linking the Smithsonian American Art museum to the Linked Data cloud. They use PROV to drive their link generation. Essentially, proposing links to human’s who then verify the connections. See:

I also liked the following paper on which hardware environment you should use when doing link discovery. Result: use GPU’s there fast!

Additionally, I think the following paper is cool because they use network statistics not just to measure but to do something, namely create links:

APIs

APIs were a growing theme of the event with things like the Linked Data Platform working group and  the successful SALAD workshop. (Fantastic acronym). Although I was surprised people in the workshop hadn’t heard of the Linked Data API. We had a lot of good feedback on the Open PHACTS API. It’s just the case that there is more developer expertise for using web service apis rather than semweb tech. I’ve actually seen a lot of demand for Semweb skills and while we our doing our best to train people there is still this gap. It’s good then that we are thinking about how these two technologies play together nicely.

Random Notes

Beyond the PDF - drawn notes day 1

Wow! The last three days have been crazy, hectic, awesome and inspiring. We just finished putting on The Future of Research Communication and e-Scholarhip (FORCE11)’s Beyond the PDF 2 conference  here in Amsterdam. (I was chair of the organizing committee and in charge of local arrangements) The idea behind Beyond the PDF was to bring together a diverse set of people (scholars, technologists, policy experts, librarians, start-ups, publishers, …) all interested in making scholarly and research communication better. In that case, I think we achieved are goal. We had 210 attendees from across the spectrum. Below are two charts: one of the types organizations of the attendees and domain they are from.

domainsorganization-chart

The program of the conference was varied. We covered new tools, business models, the context of the approach, research evaluation, visions for the futures and how to moved forward. Here, I won’t go over the entire conference here. We’ll have a complete video online soon (thanks Elsevier). I just wanted to call out some personal highlights.

Keynotes

We had two great keynotes from Kathleen Fitzpatrick of the Modern Language Association  and the other from Carol Tenopir (Chancellor’s Professor at the School of Information Sciences at the University of Tennessee, Knoxville). Kathleen discussed how it is essential for humanities to embrace new forms of scholarly communication as it allows for faster dissemination of their work.  Carol discussed the practice of reading for academics. She’s done in-depth tracking of how scientists read. Some interesting tidbits: successful scientists read more and so far social media use has not decreased the amount of reading that scientists do. The keynotes were really a sign of how much more humanities were present at this conference than Beyond the PDF 1.

2013-03-19 09.23.52

Kathleen Fitzpatrick (@kfitz). Director of Scholarly Communication , Modern Language Association


The tools are there

Jason Priem compares online journals to horses

Just two years ago at the first Beyond the PDF, there were mainly initial ideas and drafts for next generation research communication tools. At this year’s conference, there were really a huge number of tools that are ready to be used. Figshare, PDFX, Authorea, Mendeley, IsaTools, StemBook, Commons in a Box, IPython, ImpactStory and on…

Furthermore, there are different ways of publishing from PeerJ to Hypothes.is and even just posting to blog. Probably the interesting idea of the conference was the use of github to essential publish.

For me this made me think it’s time to think about my own scientific workflow and figure out how to update it to better use these tools in practice.

People made connections

At the end of the conference, I asked if people had made a new connection. Almost every hand went up. It was great to see publishers, technologists, librarians also talking together. The twitter back channel at the conference was great. We saw a lot of conversations that kept going on #btpdf2 and also people commenting while watching the live stream. Check out a great Storify of the social media stream of the conference done by Graham Steel.


Creative Commons-Licentie
Beyond the PDF 2 photographs van Maurice Vanderfeesten is in licentie gegeven volgens een Creative Commons Naamsvermelding-GelijkDelen 3.0 Unported licentie.
Gebaseerd op een werk op http://sdrv.ms/YI4Z4k.

Making it happen

We gave a challenge to the community, “what would you do with 1k today that would change scholarly communication for the better? ” The challenge was well received and we had a bunch of different ideas from sponsoring viewing parties to encouraging the adoption of DOIs in the developing world and by small publishers.

The Challenge of Evaluation

We had a great discussion around the role of evaluation.  I think the format that was used by Carole Goble for the evaluation session where we had role playing representing key players in the evaluation of research and researchers really highlighted the fact that we have a first mover problem. None of the roles feel that “they should go first”. It was unclear how to push past that challenge.

Various Roles in Science Evaluation

Summary

Personally, I had a great time. FORCE 11 is a unique community and I think brings together people that need to talk to change the way we communicate scholarship. This was my quick thoughts on the event. There’s a lot more to come. We will have the video of the event up soon. Also, we will have drawn notes posted provided by Jongens van de Tekeningen. Also, we will award a series of 1k grants to support ongoing work. Finally, I hope to see many more blog posts documenting the different views of attendees.

Thanks

We had many great sponsors that helped make a great event. Things like live streaming, student scholarships, a professional set-up, demos & dinner ensure that an event like this works.

This year I had the opportunity to be program co-chair and help organize the 4th International Provenance and Annotation Workskhop (IPAW). The event went great, really, better than I imagined. First, I was fortunate to be organizing it with James Frew from the Bren School of Environmental Science and Management a the University of California, Santa Barabara. He not only helped coordinate the program but along with his team took care of all the local organization. It’s hard to beat sunny Santa Barbara as a location, but they also made sure that everything ran smoothly: great wifi, shuttles to and from the location, tasty outdoor lunches looking over the ocean, an open air poster session with wine and cheese, and a BBQ on the beach for a workshop dinner:

The IPAW workshop dinner. Photo from Andreas Schrieber.

So big kudos to Frew and his team. Obviously, beyond being well run we covered a lot in the two days of the main workshop. The workshop had 47 attendees and you can find the twitter log here.

Research Highlights

I think the program really highlighted where we are at in provenance research today and the directions forward. I won’t go through every paper but  just try to pick 3 interesting trends.

1) Using Provenance to Address the Messiness of the Web

The Web provides us a fantastic source of knowledge. But the problem is that knowledge is completely unclean and unintegrated. Even efforts such as Linked Data while giving us better data our still messy and still under-integrated. Both researchers and firms have been trying to make clean integrated  knowledge, but then they are faced with what Timothy Lebo in his paper termed the Integrator’s Dilemma. A integrator may produce a clean well-structured data set, but in the process the resulting data set looses authority and a connection to domain expertise. To rectify this problem, provenance can be used to identify the authoritative source and connect back to domain expertise. Indeed, Jim McCusker and colleagues argued that provenance is the 3rd step to Data Sanity.

However, then we run into Tim’s 1st law of producing provenance:

For any provenance record, there exists a provenance consumer that will need it, but will not like how it is provided.

Tim suggests a service based solution to provide provenance at the correct granularity for provenance. While I don’t know if that is the right solution, it’s clear that providing provenance at the right level of granularity is one foundation to building confidence in integrated web data sources.

Another example of trying different ways of using provenance to address messiness is the use of network analysis to understand provenance captured from crowd sourced applications.

2) Provenance and Credit are Intertwined

Science has always been a driver of research in provenance and we saw a number of good pieces of work addressing domains ranging from climate analysis to archeology. However, as our key note speaker Phil Bourne pointed out in his talk, scientists are not using provenance technologies in their work. He argued that this is for two reasons: 1) because they are not given credit for all parts of the scientific process and 2) provenance infrastructure is still not easy enough to use.  Phil argued that it was fundamental that more artifacts of the research lifecycle need to be given credit to facilitate sharing and thus increase the pace of innovation particularly in the life sciences. Thus, for scientists to capture their information in a sharable fashion they need to be given credit for doing so. (Yes, he connected altmetrics to provenance – very cool from my point of view). To do this, he argued, we need better support for provenance throughout the research lifecycle. However, while tools exist, they are far from being usable and integrated enough into everyday science practice. This is a real challenge to the provenance community. We need to do better at getting our approaches into scientists hands.

3) The Problem of Post-hoc

Much work in the provenance literature has asked the question of how does one capture provenance effectively in computational systems. But many times this is just not possible. The user may not have thought about installing the system to capture provenance in the first place or may not have chosen to write down their rational for taking some action. This is an area that I’m actively researching so it was a great to see others starting to address the problem. Tom De Neiss attacked the problem of reconstructing provenance for a collection of newspaper articles using semantic similarity. An even more farther out idea presented at the workshop was to try and reconstruct the provenance of decision made by a human using simulation. Both works highlight the need for dealing with incomplete or even non-existant provenance.

These were just some of the themes that I saw. Overall, the presentations were good and the audience was engaged. We had lots of hall time and I heard many intense discussions so I’m hoping that the event spurred more research. I know personally we will try to pursue a collaboration to build a provenance corpus to study this reconstruction problem.

A Provenance Week

IPAW has a tradition of being hosted as an independent event, which allows us to not only have the two day workshop but also organize collocated events. This IPAW was the same. The Data Observation Network for Earth organized a meeting on provenance and scientific workflow collocated with IPAW. Additionally, the W3C Provenance Working Group both gave a tutorial before the workshop and held their two day face-to-face meeting afterwards. Here’s me presenting the core of the provenance data model to the 28 tutorial participants.

The Provenance Data Model. It’s easy! Photo prov:wasAttributedTo Andreas Schrieber

Conclusion

IPAW 2012 was a lot effort but it was worth it – fun discussion, beautiful weather and research insight.  Again, the community voted to have another IPAW in 2014. The community is continuing to play to its strengths in workflows, databases and science applications while exploring novel areas. In the CFP for IPAW, we wrote that “2012 will be a watershed year for provenance/annotation research.” For me, IPAW confirmed that statement.

One of the things I’ve been wondering for a while now is how easy it is to develop end-users applications that take advantage of provenance. Is the software infrastructure there, do we have appropriate interface components, are things fast enough? To test this out, we held a hackathon at the International Provenance and Annotation Workshop (IPAW 2010).

The hackathon had three core objectives:

  1. Come up with a series of end user application ideas
  2. Develop cool apps
  3. And understand where we are at in terms of enabling app development

Another thing I was hoping to do was to get people from different groups to collaborate together. So how did it turn out?

We had 18 participants who divided up into the following teams:

  • Team Electric Bill
    • Paulo Pinheiro da Silva (UTEP)
    • Timothy Lebo (RPI)
    • Eric Stephan (UTEP)
    • Leonard Salayandia (RPI)
  • Team GEXP
    • Vitor Silva (Universidade Federal do Rio de Janeiro)
    • Eduardo Ogasawara (Universidade Federal do Rio de Janeiro)
  • Team Social Provenance
    • Aida Gandara (UTEP)
    • Alvaro Graves (RPI)
    • Evan Patton (UTEP)
  • Team MID
    • Iman Naja (University of Southampton)
    • Markus Kunde (DLR)
    • David Koop (University of Utah)
  • Team TheCollaborators
    • Jun Zhao (Oxford)
    • Alek Slominski (Indiana University)
    • Paolo Missier (University of Manchester)
  • Team Crowd Wisdom
    • James Michaelis (RPI)
    • Lynda Niemeyer (AMSEC, LLC)
  • Team Science
    • Elaine Angelino (Harvard)

From these teams, we had a variety of great ideas:

  • Team Electric Bill – Understanding energy consumption in the home
  • Team GEXP – Create association between abstract experiments and workflow trials
  • Team SocialProvenance –  Track the provenance of tweets on twitter
  • Team MID – Add geographic details to provenance
  • Team TheCollaborators – Research paper that embeds the provenance of and artifact
  • Team CrowdWisdom – Use provenance to filter the information from crowd sourced websites
  • Team Science – Find the impact of a change in a script on the other parts of the script

Obviously, to implement these ideas completely would take quite a while but amazingly these teams got quite far. For example, Team SocialProvenance was able to recreate twitter conversations for a number of hashtag topics including the world cup and ipaw in Proof Markup Language. Here’s a screenshot:

Here’s another screen shot from Team MID, showing how you can navigate through an Open Provenance Model graph with geo annotations:

Geo Provenance Mashup

Geo Provenance Mashup from Team MID

Those are just two examples, the other teams got quite far as well given that we ended at 4pm.

So where are we at. We had a brief conversation at the end of the hackathon (also I received a number of emails) about whether we were at a place where we could hack provenance end-user apps. The broad conclusions were as follows:

  • The maturity of tools is not there yet especially for semantic web apps. The libraries aren’t reliable and lack documentation.
  • Time was spent generating provenance not necessarily using it.
  • It would be good to have guidelines for how to enhance applications with provenance. What’s the boundary between provenance and application data?
  • It would be nice to have a common representation of provenance to work on. (Go W3C incubator!)

You can find some more thoughts about this from Tim Lebo, here. As for the hackathon itself, the participants were really enthusiastic and several said that they would continue building on the ideas they developed in the hackathon itself.

Hackathon Winners

Myself, Jim Myers (NCSA), Luc Moreau (IPAW PC co-chair) judged the apps and came up with what we thought the top three apps. Our judging criteria were: whether the app was aimed at the end user, whether it worked, whether provenance was required, and coolness factor.  We will announce the winners tomorrow at the closing session of IPAW. The winners will receive some great prizes sponsored by the Large Knowledge Collider Project (LarKC). LarKC sponsored this hackathon because provenance is becoming a crucial part of semantic web applications. the hackathon let LarKC see how they can ensure that their platform  can support hackers in building great provenance-enabled semantic web apps.

Conclusion

I was impressed with all the participants and the apps that were produced. We are a fairly new research community so to see what could be built in so little time is great. We are getting there and I can imagine that very soon we will have the infrastructure necessary to build provenance user-facing apps fast.

Last week I had the pleasure of attending both the Web Science and  World Wide Web Conferences, which were co-located together in Raleigh, NC.  (Aside, I still find it odd that going home now is flying to Europe and not away from it.) It was my first time at both events and they both had excellent, thought provoking programs and I hope I get to go again next year. In addition, I got to meet several people in meatspace, that I had heard tons of times on the W3C’s Provenance Incubator Group telecons, at our  face-to-face meeting. As with all good conferences, you walk away with tons of things to think about, new contacts to follow-up on, and new projects to do. Here, I want to focus on what for me were the two themes of the week in Raleigh:  The Rise of Structured Data  and Understanding Your Data

1. The Rise of Structured Data

Maybe it’s because I hung out with Linked Data people or because Facebook launched its Open Graph Protocol the week before but there was tons of talk about the usefulness of structured data. There was obviously focus on open government data, which has taken off this past year with data.gov and data.gov.uk but the talk was also coming from big commercial players. In two of the panels I attended, people from Facebook, Bing, Google, and Yahoo all talked about how they were taking advantage of structured data to provide better services. Essentially, if people provide a bit of extra structure to the data on their websites, it’s easier for these services to take advantage of the data to provide useful stuff like different ways to browse or find products. It’s not that everything will be structured data but some of it goes a long way to ease the building of applications.

2. Understanding Your Data

There was an absolutely excellent keynote by Danah Boyd who made a number of points about privacy and big data, in particular, big data about people. Her thoughts about the crucial nature of understanding your data reasonated with me both as someone who is trying to work with massive data sets and someone who is trying to work with social scientists to understand science processes. Here’s a clip from the keynote talking about the importance of  understanding the meaning behind the data:

She summarized her points as follows:

1) Bigger Data are Not Always Better Data
2) Not All Data are Created Equal
3) What and Why are Different Questions
4) Be Careful of Your Interpretations
5) Just Because It is Accessible Doesn’t Mean Using It is Ethical

I think these points apply even if the massive data we’re using isn’t about people. We need to think about the provenance of our data, understand how rich it is or isn’t, and think carefully about what kind of questions, we can really answer.

Indeed, it wasn’t just in this keynote that understanding data came up as a crucial factor. When people talked about government data, it was clear that it was necessary to explain how the data was gathered and what could really be ascertained from it. For example, when the US government released the data about the stimulus package, it got harangued because it was a first draft and there were errors in it. In talks about linked data, it was evident that you needed to understand how people were using certain predicates (e.g. sameas) in order to make use of it. In the talk I gave, one question I got was whether the dataset I used was good enough to make conclusions about particular scientists (it’s not).

So understanding your data is crucial but it’s incredibly hard to do. Maybe that’s what computer scientists need to building tools to let people really understand their data.

Finally, in case your interested, I embed the slides from the talk I gave at Web Science. I think it’s a great example of social science meeting computer science (but I might be biased).

I’ve been on vacation during the week of Thanksgiving but before I took off to Amsterdam I gave a talk on provenance for multi-institutional applications for the ISI Intelligent Systems Division AI Seminar series. It’s about an hour long with question and answers. If you have time, let me know what you think. The slides are on the talk’s page as well so you can zip through those if you don’t have time to listen to the whole thing. 

I recommend checking out the AI Seminar page. We have some really great speakers come and talk to us at ISD and most of the talks are streamed and archived.

%d bloggers like this: