Last week, I was the first Language, Data and Knowledge Conference (LDK 2017) hosted in Galway, Ireland. If you show up at a natural language processing conference (especially someplace like LREC) you’ll find a group of people who think about and use linked/structured data. Likewise, if you show up at a linked data/semantic web conference, you’ll find folks who think about and use NLP. I would characterize LDK2017 as place where that intersection of people can hang out for a couple of days.
The conference had ~80 attendees from my count. I enjoyed the setup of a single track, plenty of time to talk, and also really trying to build the community by doing things together. I also enjoyed the fact that there were 4 keynotes for just two days. It really helped give spark to the conference.
Here are some my take-aways from the conference:
Social science as a new challenge domain
Antal van den Bosch gave an excellent keynote emphasizing the need for what he termed holistic approach to language especially for questions in the humanities and social science (tutorial here). This holistic approach takes into account the rich context that word occur in. In particular, he called out the notions of ideolect and socialect that are ways word are understood/used individually and in a particular social group. He are argued the understanding of these computational is a key notion in driving tasks like recommendation.
I personally was interested in Antal’s joint work with Folgert Karsdorp (checkout his github repos!) on Story Networks – constructing networks of how stories are told and retold. For example, how the story of Red Riding Hood has morphed and changed overtime and what are the key sources for its work. This reminded me of the work on information diffusion in social networks. This has direct bearing on how we can detect and track how ideas and technologies propagate in science communication.
I had a great discussion with SocialAI team (Erica Briscoe & Scott Appling) from Georgia Tech about their work on computational social science. In particular, two pointers: the new DARPA next generation social science program to scale-up social science research and their work on characterizing technology capabilities from data for innovation assessment.
Turning toward the long tail of entities
There were a number of talks that focused on how to deal with entities that aren’t necessarily popular. Bichen Shi presented work done at Nokia Bell Labs on entity mention disambiguation. They used Apache Spark to train 700,000 classifiers – one per every entity mention in wikipedia. This allowed them to obtain much more accurate per-mention entity links. Note they used Gerbil for their evaluation. Likewise, Hendrik ter Horst focused on entity linking specifically targeting technical domains (i.e. MeSH & chemicals). During Q/A it was clear that straight-up gazeetering provides an extremely strong baseline in this task. Marieke van Erp presented work on fine-grained entity typing in Spanish and Dutch using word embeddings to go classify hundreds up types.
Natural language generation from KBs is worth a deeper look
Natural language generation from knowledge bases continues a pace. Kathleen McKeown‘s keynote touched on this, in particular, her recent work on mining paraphrasal templates that combines both knowledge bases and free text. I was impressed with the work of Nina Dethlefs on using deep learning for generating textual description from a knowledge base. The key insight was how to quickly generate systems to do NLG where the data was sparse using hierarchical composition. In googling around when writing this trip report I stumbled upon Ehud Reiter’s blog which is a good read.
A couple of nice overview slides
While not a theme, there we’re some really nice slides describingfundamentals.
From C. Maria Keet:
From Christian Chiarcos/Bettina Klimek:
From Sangha Nam
Overall, it was a good kick-off to a conference. Very well organized and some nice research.
- Conference dinners on the last day of a conference – kind of an interesting idea
- It was my first time in Ireland — really fun country – and an amazing keynote about the the grammar of Irish by Graham R. Isaac
- Comparative Lexicology using Linked Data
- Was there a pub crawl – why yes…
- New person to watch in tech for scholarly communications: Bahar Sateli
- Marrying Open IE and AMR seems pretty interesting...
- And to end Colloquial WordNet