I seem to be a regular attendee of the Extended Semantic Web Conference series (2013 trip report). This year ESWC was back in Crete, which means that you can get photos like the one below taken to make your colleagues jealous:
As I write this, the conference is still going on but I had to leave early to early to head to Iceland where I will briefly gate crash the natural language processing crowd at LREC 2014. Let’s begin with the stats of ESWC:
- 204 submissions
- 25% acceptance rate
- ~ 4.5 reviews per submission
The number of submissions was up from last year. I don’t have the numbers on attendance but it seemed in-line with last year as well. So, what was I doing at the conference?
This year ESWC introduced a semantic web evaluation track. We participated in two of these new evaluation tracks. I showed off our linkitup tool for the Semantic Web Publishing Challenge. [paper]. The tool lets you enrich research data uploaded to Figshare with links to external sites. Valentina Maccatrozzo presented her contribution to the Linked Open Data Recommender Systems challenge. She’s exploring using richer semantics to do recommendation, which, from the comments on her poster, was seen as a novel approach by the attendees. Overall, I think all our work went over well. However, it would be good to see more of the VU Semweb group content in the main track. The Netherlands only had 14 paper submissions. It was also nice to see PROV mentioned in several places. Finally, conferencse are great places to do face-2-face work. I had nice chats with quite a few people, in particular, with Tobias Kuhn on the development of the nanopublications spec and with Avi Bernstein on our collaboration leveraging his group’s Signal & Collect framework.
So what were the big themes of this year’s conference. I pulled out three:
- Easing development with Linked Data
- Entities everywhere
- Methodological maturity
As a community, we’ve built interesting infrastructure for machine readable data sharing, querying, vocabulary publication and the like. Now that we have all this data, the community is turning towards making it easier to develop applications with it. This is not necessarily a new problem and people have tackled it before (e.g. ActiveRDF). But the availability of data seems to be renewing attention to this problem. This was reflected by Stefan Staab’s Keynote on Programming the Semantic Web. I think the central issue he identified was how to program against flexible data models that are the hallmark of semantic web data. Stefan argued strongly for static typing and programmer support but, as an audience member noted, there is a general trend in development circles towards document style databases with weaker type systems. It will be interesting to see how this plays out.
Aside: A thought I had was whether we could easily publish the type systems that developers create when programming back out onto the web and merge them with existing vocabularies….
This notion of easing development was also present in the SALAD workshop (a workshop on APIs). This is dear to my heart. I’ve seen in my own projects how APIs really help developers make use of semantic data when building applications. There was quite a lot of discussion around the role of SPARQL with respect to APIs as well as whether to supply data dumps or an API and what type of API that should be. I think it’s fair enough to say that Web APIs are winning, see the paper RESTful or RESTless – Current State of Today’s Top Web APIs, and we need to devise systems that deal with that while still leveraging all our semantic goodness. That being said it’s nice to see mature tooling appearing for Linked Data/Semantic Web data (e.g. RedLink tools, Marin Dimitrov’s talk on selling semweb solutions commercially).
There were a bunch of papers on entity resolution, disambiguation, etc. Indeed, Linked Data provides a really fresh arena to do this kind of work as both the data and schemas are structured and yet at the same time messy. I had quite a few nice discussions with Pedro Szekely on the topic and am keen to work on getting some of our ideas on linking into the Karma system he is developing with others. From my perspective, two papers caught my eye. One on using coreference to actually improve sparql query performance. Often times we think of all these equality links as a performance penalty, it’s interesting to think about whether they can actually help us improve performance on different tasks. The other paper was “A Probabilistic Approach for Integrating Heterogeneous Knowledge Sources“, which uses Markov Logic Networks to align web information extraction data (e.g. NELL) to DBpedia. This is interesting as it allows us to enrich clean background knowledge with data gathered from the web. It’s also neat in that it’s another example of the combination of statistical inference and (soft) rules.
This emphasis on entities is in contrast with the thought-provoking keynote by Oxford philosopher Luciano Floridi, who discussed various notions of complexity and argued that we need to think not in terms of entities but in fact interactions. This was motivated by the following statistic – that by 2020 7.5 billion people vs. 50 billion devices and all of these things will be interconnected and talking.
Indeed, while entities especially in messy data is far from being a solved problem, we are starting to see dynamics emerging as clear area of interest. This is reflected by the best student paper Hybrid Acquisition of Temporal Scopes for RDF Data.
The final theme I wanted to touch on was methodological maturity. The semantic web project is 15 years old (young in scientific terms) and the community has now become focused on having rigorous evaluation criteria. I think every paper I saw at ESWC had a strong evaluation section (or at least a strongly defensible one). This is a good thing! However, this focus pushes people towards safety in their methodology, for instance the plethora of papers that use LUBM, which can lead towards safety in research. We had an excellent discussion about this trend in the EMPIRICAL workshop – check out a brief write up here. Indeed, it makes one wonder if
- these simpler methodologies (my system is faster than yours on benchmark x) exacerbate a tendency to do engineering and not answer scientific questions; and
- whether the amalgamation of ideas that characterizes semantic web research is toned down leading to less exciting research.
One answer to this trend is to encourage a more wide spread acceptance and knowledge of different scientific methodologies (e.g. ethnography), which would allow us to explore other areas.
Finally, I would recommend Abraham Bernstein & Natasha Noy – “Is This Really Science? The Semantic Webber’s Guide to Evaluating Research Contributions“, which I found out about at the EMPIRICAL workshop.
Here are some other pointers that didn’t fit into my themes.
- The best research paper went to the work on Trusty URIs by Tobias Kuhn . The approach let’s you verify that the contents of an RDF document are indeed what the URL was meant to refer to. Some neat hacks with spaces and hashes. This is important for immutable nanopublications.
- WaterFowl Triple Store – run sparql queries + RDFS reasoning on a super compressed triple store. These Succinct Data Structures are cool.
- conText service for lightweight information extraction from text to RDF.
- Tracking the evolution of research communities over time.
- TuvaLabs is a cool company offering curated data sets to help teach digital literacy.
- I’m a sucker for buying books – I got this one.