Tag Archives: semantic web

Last week, I hung out in Bethlehem, Pennsylvania for the the 14th International Semantic Web Conference. Bethlehem is famous for the Lehigh University Benchmark  (LUBM) and Bethlehem Steel. This is the major conference focused on the intersection of semantics and web technologies. In addition to being technically super cool, it was a great chance for me to meet many friends and make some new ones.

Let’s begin with some stats:

  • ~450 attendees
  • The conference continues to be selective:
    • Research track: 22% acceptance rate
    • Empirical studies track: 29% acceptance rate
    • In-use track: 40% acceptance rate
    • Datasets and Ontologies: 22% acceptance rate
  • There were 265 submissions across all tracks which is surprisingly the same number as last year.
  • More stats and info in Stefan’s slides (e.g. move to Portugal if you want to get your papers in the conference.)
  • Fancy visualizations courtesy of the STKO group

Before getting into what I thought were the major themes of the conference, a brief note. Reviewing is at the heart of any academic conference. While we can always try and improve review quality, it’s worth calling out good reviewing. The best reviewers were Maribel Acosta (research) and Markus Krötzsch (applied). As data sets and ontologies track co-chair, I can attest to how important good reviewers are.  For this new track we relied heavily on reviewers being flexible and looking at these sorts of contributions differently. So thanks to them!

For me there were three themes of ISWC:

  1. The Spectrum of Entity Resolution
  2. The Spectrum of Linked Data Querying
  3. Buy more RAM

The Spectrum of Entity Resolution

Maybe its because I attended the NLP & DBpedia workshop or the conversation I had about string similarity with Michelle Cheatham, but one theme that I saw was the continued amalgamation of natural language processing (NLP) style entity resolution with database entity resolution (i.e. record linkage). This movement stems from the fact that an increasing amount of linked data is a combination of data extracted from semi-structured sources as well as from NLP. But in addition to that, NLP sources rely on some of these semi-structured datasources to do NLP.

Probably, the best example of that idea is the work that Andrew McCallum presented in his keynote on “epistemlogical knowledge bases”.

Briefly, the idea is to reason with all the information coming from both basic low level NLP (e.g. basic NER, or even surface forms) as well as the knowledge base jointly (plus, anything else) to generate a knowledge base.  One method to do this is universal schemas. For a good intro, check out Sebastien Riedel’s slides.

From McCallum, I like the following papers which gives a good justification and results of doing collective/joint inference.

(Self promotion aside: check out Sara Magliacane’s work on Probabilistic Soft Logics for another way of doing joint inference.)

Following on from this notion of reasoning jointly, Hulpus, Prangnawarat and Hayes showed how to use the graph-based structure of linked data to to perform joint entity and word sense disambiguation from text. Likewise, Prokofyev et al. use the properties of a knowledge graph to perform better co-reference resolution. Essentially, they use this background knowledge to split the clusters of co-referrent entities produced by Stanford CoreNLP. On the same idea, but for more structured data, the TableEL system uses a joint model with soft constraints to perform entity linking for web tables, improving performance by up-to 75% on web tables. (code & data)

One approach to entity linking that I liked was from the Raphael Troncy’s crew titled “Reveal Entities From Texts With a Hybrid Approach” (paper, slides). (Shouldn’t it be “Revealing..”?). They showed that by using essentially the provenance of the data sources they are able to build an adaptive entity linking pipeline. Thus, one doesn’t necessarily have to do as much domain tuning to use these pipelines.

While not specifically about entity resolution, a paper worth pointing out is Type-Constrained Representation Learning in Knowledge Graphs from Denis Krompaß, Stephan Baier and Volker Tresp. They show how background knowledge about entity types can help improve link prediction tasks for generating knowledge graphs. Again, use the kitchen sink and you’ll perform better.

There were a couple of good resources presented for entity resolution tasks.  Bryl, Bizer and Paulheim produced a dataset of surface forms for dbpedia entities. They were able to boost performance up to 20% for extracting accurate surface forms for entities through filtering. Another tool, LANCE looks great for systematically generating benchmark and test sets for instance matching (i.e. entity linking). Also, Michel Dumontier presented work that had a benchmark for entity linking from the life sciences domain.

Finally, as we get better at entity resolution, I think people will turn towards fusion (getting the best possible representation for a real world entity). Examples include:

The Spectrum of Linked Data Querying

So Linked Data Fragments from Ruben Verborgh was the huge breakout of the conference. Oscar Corcho’s excellent COLD keynote was a riff off thinking about the spectrum (from data dumps through to full sparql queries) that was introduced by Reuben. Another example was the work of Maribel Acosta and Maria-Esther Vidal on “Networks of Linked Data Eddies: An Adaptive Web Query Processing Engine for RDF Data”. They developed an adaptive client side spraql query engine for linked data fragments. This allows the server side to support a much simpler API by having a more intelligent client side. (An aside, kids this is how a technical talk should be done. Precise, clean, technical, understandable. Can’t wait to have the the video lecture for reference.)

Even the most centralized solution, the LODLaundromat which is a clean crawl of the entire web of data supports Linked Data Fragments. In some sense, by asking the server to do less you can handle more linked data, and thus do more powerful analysis. This is exemplified by the best paper LODLab byLaurens Rietveld, Wouter Beek, and Stefan Schlobach, which allowed for the reproduction of 3 existing analysis of the web of data at scale.

I think Olaf Hartig, in his paper on LDQL, framed the problem best as (N, Q) (slides). First define the “crawl” of the web you want to query (N)  and then define the query (Q). When we think about what and where are crawls are, we can think about what execution strategies and types of queries we can best support. Or put another way:

More Main Memory = better Triple Stores

Designing scalable graph / triple stores has always been a challenge. We’ve been trapped by the limits of RAM. But computer architecture is changing, and we now have systems that have a lot of main memory either in one machine or across multiple machines. This is a boon to triple stores and graph processing in general. See for example Leskovec team’s work from SIGMOD:

We saw that theme at ISWC as well:

Moral of the story: Buy RAM


This years conference explored the many spectra of the combination of the web and semantics. I liked the mix of methods used by papers and the range of practical (the industry session was packed) to theoretical results. I also think the community is no longer hemmed in by the standards but are using them as solid starting point. This was pointed out by Ian Horrocks in his keynote:
Additionally, this flexibility was exemplified by the best applied paper, “Building and Using a Knowledge Graph to Combat Human Trafficking” by  Pedro Szekely et al.. They used the parts of the semantic web stack that helped (like ontologies and JSON-LD) but used elastic search for storage to create a vital and important solution to a real challenging problem.
Overall, this was an excellent conference.  Next year’s conference is in Kobe, I hope you submit some great papers and I’ll seen you there!

Random Thoughts

I seem to be a regular attendee of the Extended Semantic Web Conference series (2013 trip report). This year ESWC was back in Crete, which means that you can get photos like the one below taken to make your colleagues jealous:

2014-05-26 18.11.15


As I write this, the conference is still going on but I had to leave early to early to head to Iceland where I will briefly gate crash the natural language processing crowd at LREC 2014. Let’s begin with the stats of ESWC:

  • 204 submissions
  • 25% acceptance rate
  • ~ 4.5 reviews per submission

The number of submissions was up from last year. I don’t have the numbers on attendance but it seemed in-line with last year as well. So, what was I doing at the conference?

This year ESWC introduced a semantic web evaluation track. We participated in two of these new evaluation tracks. I showed off our linkitup tool for the Semantic Web Publishing Challenge. [paper]. The tool lets you enrich research data uploaded to Figshare with links to external sites. Valentina Maccatrozzo presented her contribution to the Linked Open Data Recommender Systems challenge. She’s exploring using richer semantics to do recommendation, which, from the comments on her poster, was seen as a novel approach by the attendees. Overall, I think all our work went over well. However, it would be good to see more of the VU Semweb group content in the main track. The Netherlands only had 14 paper submissions. It was also nice to see PROV mentioned in several places. Finally, conferencse are great places to do face-2-face work. I had nice chats with quite a few people, in particular, with Tobias Kuhn on the development of the nanopublications spec and with Avi Bernstein on our collaboration leveraging his group’s Signal & Collect framework.

So what were the big themes of this year’s conference. I pulled out three:

  1. Easing development with Linked Data
  2. Entities everywhere
  3. Methodological maturity

Easing development

As a community, we’ve built interesting infrastructure for machine readable data sharing, querying, vocabulary publication and the like. Now that we have all this data,  the community is turning towards making it easier to develop applications with it. This is not necessarily a new problem and people have tackled it before (e.g. ActiveRDF). But the availability of data seems to be renewing attention to this problem. This was reflected by Stefan Staab’s Keynote on Programming the Semantic Web. I think the central issue he identified was how to program against flexible data models that are the hallmark of semantic web data. Stefan argued strongly for static typing and programmer support but, as an audience member noted, there is a general trend in development circles towards document style databases with weaker type systems. It will be interesting to see how this plays out.

Aside: A thought I had was whether we could easily publish the type systems that developers create when programming back out onto the web and merge them with existing vocabularies….

This notion of easing development was also present in the SALAD workshop (a workshop on APIs). This is dear to my heart. I’ve seen in my own projects how APIs really help developers make use of semantic data when building applications. There was quite a lot of discussion around the role of SPARQL with respect to APIs as well as whether to supply data dumps or an API and what type of API that should be. I think it’s fair enough to say that Web APIs are winning, see the paper RESTful or RESTless – Current State of Today’s Top Web APIs, and we need to devise systems that deal with that while still leveraging all our semantic goodness. That being said it’s nice to see mature tooling appearing for Linked Data/Semantic Web data (e.g. RedLink toolsMarin Dimitrov’s talk on selling semweb solutions commercially).

Entities everywhere

There were a bunch of papers on entity resolution, disambiguation, etc. Indeed, Linked Data provides a really fresh arena to do this kind of work as both the data and schemas are structured and yet at the same time messy. I had quite a few nice discussions with Pedro Szekely on the topic and am keen to work on getting some of our ideas on linking into the Karma system he is developing with others.  From my perspective, two papers caught my eye. One on using coreference to actually improve sparql query performance. Often times we think of all these equality links as a performance penalty, it’s interesting to think about whether they can actually help us improve performance on different tasks. The other paper was “A Probabilistic Approach for Integrating Heterogeneous Knowledge Sources“, which uses Markov Logic Networks to align web information extraction data (e.g. NELL) to DBpedia. This is interesting as it allows us to enrich clean background knowledge with data gathered from the web. It’s also neat in that it’s another example of the combination of  statistical inference and (soft) rules.

This emphasis on entities is in contrast with the thought-provoking keynote by Oxford philosopher Luciano Floridi, who discussed various notions of complexity and argued that we need to think not in terms of entities but in fact interactions. This was motivated by the following statistic – that by 2020 7.5 billion people vs. 50 billion devices and all of these things will be interconnected and talking.

Indeed, while entities especially in messy data is far from being a solved problem, we are starting to see dynamics emerging as clear area of interest. This is reflected by the best student paper Hybrid Acquisition of Temporal Scopes for RDF Data.

Methodological maturity

The final theme I wanted to touch on was methodological maturity. The semantic web project is 15 years old (young in scientific terms) and the community has now become focused on having rigorous evaluation criteria. I think every paper I saw at ESWC had a strong evaluation section (or at least a strongly defensible one). This is a good thing! However, this focus pushes people towards safety in their methodology, for instance the plethora of papers that use LUBM, which can lead towards safety in research. We had an excellent discussion about this trend in the EMPIRICAL workshop – check out a brief write up here. Indeed, it makes one wonder if

  1. these simpler methodologies (my system is faster than yours on benchmark x) exacerbate a tendency to do engineering and not answer scientific questions; and
  2. whether the amalgamation of ideas that characterizes semantic web research is toned down leading to less exciting research.

One answer to this trend is to encourage a more wide spread acceptance and knowledge of different scientific methodologies (e.g. ethnography), which would allow us to explore other areas.

Finally,  I would recommend Abraham Bernstein & Natasha Noy – “Is This Really Science? The Semantic Webber’s Guide to Evaluating Research Contributions“, which I found out about at the EMPIRICAL workshop.

Final Notes

Here are some other pointers that didn’t fit into my themes.


It’s been about a week since I got from Australia attending the International Semantic Web Conference  (ISWC 2013).  This is the premier forum for the latest in research on using semantics on the Web. Overall, it was a great conference – both well run and there was a good buzz. (Note, I’m probably a bit biased – I was  chair of this year’s In-Use track) .

ISWC is a fairly hard conference to get into and the quality is strong.

More importantly, almost all the talks I went to were worth thinking about. You can find the proceedings of the conference online either as a complete zip here or published by Springer. You can find more stats on the conference here.

As an aside, before digging into the meat of the conference – Sydney was great. Really a fantastic city – very cosmopolitan and with great coffee. I suggest Single Origin Roasters.  Also, Australia has wombats – wombats are like the chillest animal ever.


From my perspective, there were three main themes to take away from the conference:

  1. Impressive applications of semantic web technologies
  2. Core ontologies as the framework for connecting complex integration and retrieval tasks
  3. Starting to come to grips with messiness


We are really seeing how semantic technologies can power great applications. All three keynotes highlighted the use of Semantic Tech. I think Ramanathan Guha’s keynote probably highlighted this the best in his discussion of the growth of

Beyond the slide above, he brought up representatives from Yandex, Yahoo, and Microsoft on stage to join Google to tell how they are using Drupal and WordPress will have in their cores in 2014. is being used to drive everything from veteran friendly job search, to rich pins on Pinterest and enabling Open Table reservations to be easily put into your calendar. So is clearly a success.

Peter Mika presented a paper on how Yahoo is using ontologies to drive entity recommendations in searches. For example, you search for Brad Pitt and they show you related entities like Angelina Jolie or  Fight Club. The nice thing about the paper is that it showed how the deployment in production (in Yahoo! Web Search in the US) increases click through rates.

Roi Blanco, Berkant Barla Cambazoglu, Peter Mika, Nicolas Torzec: Entity Recommendations in Web Search. International Semantic Web Conference (2) 2013: 33-48

I think it was probably Yves Raimond’s conference – he showed some amazing things being done at the BBC using semantic web technology. He had an excellent keynote at the COLD workshop – also highlighting some challenges on where we need to improve to ease the use of these technologies in production. I recommend you check out the slides above. Of all the applications, their work on mining the world service archive  of the BBC to enrich content being created. This work won the Semantic Web Challenge.

In the biomedical domain, there were two  papers showing how semantics can be embedded in tools that regular users use.  One showed how the development of ICD-11 (ICD is the most widely used clinical classification developed by the WHO) is  supported using semtech. The other I liked was the use of excel templates (developed using RightField) that transparently captured data according to a domain model for Systems biology.

Also in the biomedical domain, IBM presented an approach for using semantic web technologies to help coordinate health and social care at the semantic web challenge.

Finally, there was a neat application presented by Jane Hunter applying these technologies to art preservation: The Twentieth Century in Paint.

I did a review of all the in-use papers leading up to the conference but it’s good enough to say that there were numerous impressive applications. Also, I think it says something about the health of the community when you see slides like this:

Core Ontologies + Other Methods

There were a number of interesting papers that were around the idea of using a combination of well-known ontologies and then either record linkage or other machine learning methods to populate knowledge bases.

A paper that I like a lot (and also won the best student paper) was titled Knowledge Graph Identification (by Jay Pujara, Hui Mia, Lise Getoor and William Cohen) sums it up nicely:

Our approach, knowledge graph identification (KGI) combines the tasks of entity resolution, collective classification and link prediction mediated by rules based on ontological information.

Interesting papers under this theme were:

From my perspective, it was also nice to see the use of the W3C Provenance Model (PROV) as one of these core ontologies in many different papers and two of the keynotes. People are using it as a substructure to do a number of different applications – I intend to write a whole post on this – but until then here’s proof by twitter:

Coming to grips with messiness

It’s pretty evident that when dealing with the web things are messy. There were a couple of papers that documented this empirically either in terms of the availability of endpoints or just looking at the heterogeneity of the markup available from web pages.

In some sense, the papers mentioned in the prior theme also try to deal with this messiness. Here are another couple of papers looking at essentially how do deal with or even use this messiness.

One thing that seemed a lot more present in this year’s conference than last year  was the term entity. This is obviously popular because of things like google knowledge graph – but in some sense maybe it gives a better description of what we are aiming to get out of the data we have – machine readable descriptions or real world concepts/things.


There are some things that are of interest that don’t fit neatly into the themes above. So I’ll just try a bulleted list.

  • We won the Best Demo Paper Award for
  • Our paper on using NoSQL stores for RDF went over very well. Congrats to Marcin for giving a good presentation.
  • The format of mixing talks from different tracks by topic and having only 20 minutes per talk was great.
  • VUA had a great showing – 3 main track papers, a bunch of workshop papers, a couple of different posters, 4 workshop organizers giving talks at the workshop summary session, 2 organizing committee members, alumni all over the place, plus a bunch of stuff I probably forgot to mention.
  • The colocation with Web Directions South was great – it added a nice extra energy to the conference.
  • There were best reviewer awards won by Oscar Corcho, Tania Tudorache, and Aidan Hogan
  • Peter Fox seemed to give a keynote just for me – concept maps, PROV followed with abductive reasoning.
  • Did I mention that the coffee in Sydney (and Newcastle) is really good and lots of places serve proper breakfast!


This past week we (Achille Fokoue & myself) sent the paper notifications for the 2013 International Semantic Web Conference’s In-Use Track. The track seeks to highlight innovative semantic technologies being applied and deployed in practice. With the selection made by the program committee (Thanks!), I think we have definitely achieved that goal.

So if you’re coming to Sydney (& you should definitely be coming to Sydney) here’s what’s in store. (Papers are listed below.) You’ll see  a number of papers where semantic technologies are being deployed in companies to help end users including:

  • how semantic technologies are helping the BBC expose its archive to its journalists [1];
  • how OWL and RDF and being combined to give energy saving tips to 300,000 customers at EDF [2];
  • and how the search result pages in Yahoo! Search are being improved through the use ofknowledge bases [3].


Dealing with streaming data has been a growing research theme in recent years. In the in-use track, we are seeing some of the fruits of that research in particular with respect to monitoring city events.  Balduini et al. report on the use of Streaming Linked Data Framework for monitoring the London Olympic Games 2012 and Milano Design Week 2013. (Yes, the semantic web is fashionable) [4]. IBM will present its work on the real-time urban monitoring of Dublin – requiring both scale but also low-latency solutions [5].

Life sciences

Semantic technologies have a long history of being deployed in healthcare and life sciences. We’ll see that again at this year’s conference. We get a progress report on the usage of these technologies in the development of the 11th revision of the International Classification of Diseases (ICD-11) [6]. ICD-11 involves 270 domain experts using the iCAT tool. We see how the intermixing (plain-old) spreadsheets and semantic technologies is enabling systems biology to better share its data [7]. In the life sciences, and in particular in drug discovery, both public and private data are critical, we see how the Open PHACTS project is tackling the problem of intermixing such data [8].

Semantics for Science & Research

Continuing on the science theme, the track will have reports on improving the reliability of scientific workflows [9], how linked data is being leverage to understand the economic impact of R&D in Europe [10]; and how our community is “eating its own dogfood” to enable better scientometric analysis of journals [11].  Lastly, you’ll get a talk on  the use of semantic annotations to help crowd source 3D representations of Greek Pottery for cultural heritage (a paper that I just think is so cool – I hope for videos) [12].

Semantic Data Availability

Reasoning relies on the availability of data exposed with its associated semantics. We’ve seen how the Linking Open Data movement helped bootstrap the uptake of Semantic Web technologies. Likewise, the widespread deployment of RDFa and microformats have dramatically increased the amount of data availability. But what’s out there? Bizer et al. give us a report based on analyzing  3 billion web pages. (I expect some awesome charts in this presentation) [13].

Enriching data with semantics has benefits but also comes at a cost. Based on a case study of converting Norwegian Petroleum Directorate’s FactPages, we’ll get insight into those trade-offs [14].  Reducing the effort for such conversations and particularly interlinking is a key challenge. The Cross-language Service Retrieve system is tackling this for open government data across multiple languages [15].

Finally, in practice, a key way to “semantize” data is through the use of natural language processing tools. You’ll see how semantic tech is facilitating the reusability and interoperability of NLP tools using NIF 2.0 framework [16].


I hope you’ll agree that this really represents the best from the semantic web community. These 16 papers were selected from 79 submissions. The program committee (for the most part)  did a great job both with their reviewers and importantly the discussion. Any many cases it was a hard decision and the PCs ability to discuss and revise their views was crucial in making the final selection. Thanks to the PC, it is a lot of work to do and we definitely asked them to do it in a fairly compact way. Thank you!

A couple of other thoughts, I think decision to institute an abstract submission for the in-use track was a good one and that author rebuttals are more helpful than I thought they would be.

ISWC 2013 is going to be a fantastic conference. I’m looking forward to the location, the sessions and the community. I look forward to seeing you there. There are many ways to participate so check out 


  1. Yves Raimond, Michael Smethurst, Andrew McParland and Christopher Lowis. Using the past to explain the present: interlinking current affairs with archives via the Semantic Web
  2. Pierre Chaussecourte, Birte Glimm, Ian Horrocks, Boris Motik and Laurent Pierre. The Energy Management Adviser at EDF
  3. Roi Blanco, Berkant Barla Cambazoglu, Peter Mika and Nicolas Torzec. Entity recommendations in Web Search
  4. Marco Balduini, Emanuele Della Valle, Daniele Dell’Aglio, Themis Palpanas, Mikalai Tsytsarau and Cristian Confalonieri. Social listening of City Scale Events using the Streaming Linked Data Framework
  5. Simone Tallevi-Diotallevi, Spyros Kotoulas, Luca Foschini, Freddy Lecue and Antonio Corradi. Real-time Urban Monitoring in Dublin using Semantic and Stream Technologies
  6. Tania Tudorache, Csongor I Nyulas, Natasha F. Noy and Mark Musen. Using Semantic Web in ICD-11: Three Years Down the Road
  7. Katherine Wolstencroft, Stuart Owen, Olga Krebs, Quyen Ngyuen, Jacky. L. Snoep, Wolfgang Mueller and Carole Goble. Semantic Data and Models Sharing in systems Biology: The Just Enough Results Model and the SEEK Platform
  8. Carole Goble, Alasdair J. G. Gray, Lee Harland, Karen Karapetyan, Antonis Loizou, Ivan Mikhailov, Yrjana Rankka, Stefan Senger, Valery Tkachenko, Antony Williams and Egon Willighagen. Incorporating Private and Commercial Data into an Open Linked Data Platform for Drug Discovery
  9. José Manuel Gómez-Pérez, Esteban García-Cuesta, Aleix Garrido and José Enrique Ruiz. When History Matters – Assessing Reliability for the Reuse of Scientific Workflows
  10. Amrapali Zaveri, Joao Ricardo Nickenig Vissoci, Cinzia Daraio and Ricardo Pietrobon. Using Linked Data to evaluate the impact of Research and Development in Europe: a Structural Equation Model
  11. Yingjie Hu, Krzysztof Janowicz, Grant Mckenzie, Kunal Sengupta and Pascal Hitzler. A Linked Data-driven Semantically-enabled Journal Portal for Scientometrics
  12. Chih-Hao Yu, Tudor Groza and Jane Hunter. Reasoning on crowd-sourced semantic annotations to facilitate cataloguing of 3D artefacts in the cultural heritage domain
  13. Christian Bizer, Kai Eckert, Robert Meusel, Hannes Mühleisen, Michael Schuhmacher and Johanna Völker. Deployment of RDFa, Microdata, and Microformats on the Web – A Quantitative Analysis
  14. Martin G. Skjæveland, Espen H. Lian and Ian Horrocks. Publishing the Norwegian Petroleum Directorate’s FactPages as Semantic Web Data
  15. Fedelucio Narducci, Matteo Palmonari and Giovanni Semeraro. Cross-language Semantic Retrieval and Linking of E-gov Services
  16. Sebastian Hellmann, Jens Lehmann, Sören Auer and Martin Brümmer. Integrating NLP using Linked Data

I think since I’ve moved to Europe I’ve been attending ESWC (Extended/European Semantic Web Conference) and I always get something out of the event. There are plenty of familiar faces but also quite a few new people and it’s a great environment for having chats. In addition, the quality of the content is always quite good. This year the event was held in Montpellier and was for the most part well organized: the main conference wifi worked!

The stats:

  • 300 participants
  • 42 accepted papers from 162 submissions
  • 26% acceptance rate
  • 11 workshops + 7 tutorials

So what was I doing there:

The VU Semantic Web group also had a strong showing:

  • Albert Meroño-Peñuela won the best PhD symposium paper for his work on digital humanities and the semantic web.
  • The USEWOD workshop’s (led by Laura Hollink) datasets were used by a number of main track papers for evaluation.
  • Stefan Schlobach and Laura Hollink were on the organizing committee. And we organized a couple of workshops & tutorials.
  • Posters/Demos:
    • Albert Meroño-Peñuela, Rinke Hoekstra, Andrea Scharnhorst, Christophe Guéret and Ashkan Ashkpour. Longitudinal Queries over Linked Census Data.
    • Niels Ockeloen, Victor de Boer and Lora Aroyo. LDtogo: A Data Querying and Mapping Framework for Linked Data Applications.
  • Several workshop papers.

I’ll try to pull out what I thought were the highlights of the event.

What is a semantic web application?

Can you escape Frank?

Can you escape Frank?

The keynotes from Enrico Motta and David Karger focused on trying to define what a semantic web application was. This starts out in the form of does a Semantic Web application need to use the Semantic Web set of standards (e.g. RDF, OWL, etc). So from my perspective, the answer is no. These standards are great infrastructure for building these applications but are they necessary, no (see google knowledge graph).  Then what is a semantic web application?

From what I could gather, Motta would define it as an application that is scalable, uses the web and embraces Model Theoretic semantics. For me that’s rather limiting, there are many other semantics that may be appropriate… we can ground meaning in something else other than model theory. I think a good example of this is the work on Pragmatic Semantics that my colleague Stefan Schlobach presented at the Artificial Intelligence meets the Semantic Web workshop. Or we can reach back into AI and see discussion’s from Brooks’ classic paper Elephant’s Don’t Play Chess.  I felt that Karger’s definition (in what was a great keynote) was getting somewhere. He defined a semantic web application essentially as:

An application whose schema is expected to change.

This seems to me to capture the semantic portion of the definition, in a sense that the semantics need to be understood on the fly. However, I think we need to role the web back into this definition… Overall, I thought this discussion was worth having and helps the field define what it is that we are aiming at. To be continued…

Homebrew databases

2013-05-29 09.18.05

Homebrew databases

As I said, I thought Karger’s keynote was great. He gave a talk within a talk, on the subject of homebrew databases from this paper in CHI 2011:

Amy Voida, Ellie Harmon, and Ban Al-Ani. 2011. Homebrew databases: complexities of everyday information management in nonprofit organizations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11). ACM, New York, NY, USA, 915-924. DOI=10.1145/1978942.1979078

They define a homebrew database as “an assemblage of information management resources that people have pieced together to satisfice their information management needs.” This is just what we see all the time, the combination of excel, word, email, databases and don’t forget normal paper brought together to try to attack information management problems. A number of our use cases from the pharma industry as well as science reflect essentially this practice. It’s great to see a good definition of this problem grounded in ethnographic studies.

The Concerns of Linking

There were a couple of good papers on generating linkage across datasets (the central point of linked data). In Open PHACTS, we’ve been dealing with the notion of essentially context dependent linkages. I think this notion is becoming more prevalent in the community. We had a lot of positive response on this in the poster session when presenting Open PHACTS. Probably, my favorite paper was on linking the Smithsonian American Art museum to the Linked Data cloud. They use PROV to drive their link generation. Essentially, proposing links to human’s who then verify the connections. See:

I also liked the following paper on which hardware environment you should use when doing link discovery. Result: use GPU’s there fast!

Additionally, I think the following paper is cool because they use network statistics not just to measure but to do something, namely create links:


APIs were a growing theme of the event with things like the Linked Data Platform working group and  the successful SALAD workshop. (Fantastic acronym). Although I was surprised people in the workshop hadn’t heard of the Linked Data API. We had a lot of good feedback on the Open PHACTS API. It’s just the case that there is more developer expertise for using web service apis rather than semweb tech. I’ve actually seen a lot of demand for Semweb skills and while we our doing our best to train people there is still this gap. It’s good then that we are thinking about how these two technologies play together nicely.

Random Notes

The VU University Amsterdam computer science department has been a pioneer at putting structured data and Semantic Web into the undergraduate curriculum through our Web-based Knowledge Representation. I’ve had the pleasure of teaching the class for the past 3 years. The class is done in a short block of 8 weeks (7 weeks if you give them a week for exams). It’s a fairly complicated class for second year undergraduates but each year the technology becomes easier making it easier for the students to ground the concepts of KR and Web-based data into applications.

The class involves 6 lectures covering the major ground of Semantic Web technologies and KR. We then give them 3 1/2 weeks to design and hopefully build a Semantic Web application in pairs. During this time we give one-on-one support through appointments. For most students, this is the first time they’ve come into contact with Semantic Web technologies.

This year they built applications based on The Times Higher Education 2011 World University rankings. They converted databases to RDF, developed their own ontologies, integrated data from the linked data cloud and visualized data using sparql. I was impressed with all the work they did and I wanted to share some of their projects. Here are four screencasts from the applications the students built.

Points of Interest Around Universities

Guess Which University

Find Universities by Location

SPARQL Query Builder for University Info

The Journal of Web Semantics recently published a special issue on Using Provenance in the Semantic Web edited by myself and Yolanda Gil. (Vol 9, No 2 (2011)). All articles are available on the journal’s preprint server.

The issue highlights top research at the intersection of provenance and the Semantic Web. The papers addressed a range of topics including:

  • tracking provenance of DBpedia back to the underlying Wikipedia edits [Orlandi & Passant];
  • how to enable reproducibility using Semantic techniques [Moreau];
  • how to use provenance to effectively reason over large amounts (1 billion triples) of messy data [Bonatti et al.]; and
  • how to begin to capture semantically the intent of scientists [Pignotti et al.].
 Our editorial highlights a common thread between the papers and sums them up as follows:

A common thread through these papers is the use of already existing provenance ontologies. As the community comes to an increasing agreement on the commonalities of provenance representations through efforts such as the W3C Provenance Working Group, this will further enable new research on the use of provenance. This continues the fruitful interaction between standardization and research that is one of the hallmarks of the Semantic Web.

Overall, this set of papers demonstrates the latest approaches to enabling a Web that provides rich descriptions of how, when, where and why Web resources are produced and shows the sorts of reasoning and applications that these provenance descriptions make possible

Finally, it’s important to note that this issue wouldn’t have been possible without the quick and competent reviews done by the anonymous reviewers. This is my public thank you to them.

I hope you take a chance to take a look at this interesting work.

%d bloggers like this: