Archive

Tag Archives: #semanticweb

I had the pleasure of attending the Web Conference 2018 in Lyon last week along with my colleague Corey Harper . This is the 27th addition of the largest conference on the World Wide Web. I have tremendous difficulty  not calling it WWW but I’ll learn! Instead of doing two trip reports the rest of this is a combo of Corey and my thoughts. Before getting to what we took away as main themes of the conference let’s look at the stats and organization:

It’s also worth pointing out that this is just the research track. There were 27 workshops,  21 tutorials, 30 demos (Paul was co-chair), 62 posters, four collocated conferences/events, 4 challenges, a developer track and programming track, a project track, an industry track, and… We are probably missing something as well. Suffice to say, even with the best work of the organizers it was hard to figure out what to see. Organizing an event with 2200+ attendees is a thing is a massive task – over 80 chairs were involved not to mention the PC and the local heavy lifting. Congrats to Fabien, Pierre-Antoine, Lionel and the whole committee for pulling it off.  It’s also great to see as well that the proceedings are open access and available on the web.

Given the breadth of the conference, we obviously couldn’t see everything but from our interests we pulled out the following themes:

  • Dealing with a Polluted Web
  • Tackling Tabular Data
  • Observational Methods
  • Scientific Content as a Driver

Dealing with a Polluted Web

The Web community is really owning it’s responsibility to help mitigate the destructive uses to which the Web is put. From the “Recoding Black Mirror” workshop, which we were sad to miss, through the opening keynote and the tracks on Security and Privacy and Fact Checking, this was a major topic throughout the conference.

Oxford professor Luciano Floridi gave an excellent first keynote  on “The Good Web” which addressed this topic head on. He introduced a number of nice metaphors to describe what’s going on:

  • Polluting agents in the Web ecosystem are like extremphiles, making the environment hostile to all but themselves
  • Democracy in some contexts can be like antibiotics: too much gives growth to antibiotic resistant bacteria.
  • His takeaway is that we need a bit of paternalism in this context now.

His talk was pretty compelling,  you can check out the full video here.

Additionally, Corey was able to attend the panel discussion that opened the “Journalism, Misinformation, and Fact-Checking” track, which included representation from the Credibility Coalition, the International Fact Checking Network, MIT, and WikiMedia. There was a discussion of how to set up economies of trust in the age of attention economies, and while some panelists agreed with Floridi’s call for some paternalism, there was also a warning that some techniques we might deploy to mitigate these risks could lead to “accidental authoritarianism.” The Credibility Coalition also provided an interesting review of how to define credibility indicators for news looking at over 16 indicators of credibility.

We were able to see parts of the “Web and Society track”, which included a number of papers related to social justice oriented themes. This included an excellent paper that showed how recommender systems in social networks often exacerbate and amplify gender and racial disparity in social network connections and engagement. Additionally, many papers addressed the relationship between the mainstream media and the web. (e.g. political polarization and social media, media and public attention using the web).

Some more examples: The best demo was awarded to a system that automatically analyzed privacy policies of websites and summarized them with respect to GDPR and:

More generally, it seems the question is how do we achieve quality assessment at scale?

Tackling Tabular Data

Knowledge graphs and heterogenous networks (there was a workshop on that) were a big part of the conference. Indeed the test of time paper award went to the original Yago paper. There were a number of talks about improving knowledge graphs for example for improving on question answering tasks, determining attributes that are needed to complete a KG or improving relation extraction. While tables have always been an input to knowledge graph construction (e.g. wikpedia infoboxes), an interesting turn was towards treating tabular data as a focus area.

As Natasha Noy from Google noted in her  keynote at the SAVE-SD workshop,  this is an area with a number of exciting research challenges:img_0034_google_savesd.jpg

There was a workshop on data search with a number of papers on the theme. In that workshop, Maarten de Rijke gave a keynote on the work his team has been doing in the context of data search project with Elsevier.

In the main track, there was an excellent talk on Ad-Hoc Table Retrieval using Semantic Similarity. They looked at finding semantically central columns to provide a rank list of columns. More broadly they are looking at spreadsheet compilation as the task (see smarttables.cc and the dataset for that task.) Furthermore, the paper Towards Annotating Relational Data on the Web with Language Models looked at enriching tables through linking into a knowledge graph.

Observational Methods

Observing  user behavior has been a part of research on the Web, any web search engine is driven by that notion. What did seem to be striking is the depth of the observational data being employed. Prof. Lorrie Cranor gave an excellent keynote on the user experience of web security (video here). Did you know that if you read all the privacy policies of all the sites you visit it wold take 244 hours per year? Also, the idea of privacy as nutrition labels is pretty cool:

But what was interesting was her labs use of an observatory of 200 participants who allowed their Windows home computers to be instrumented. This kind of instrumentation gives deep insight into how users actually use their browsers and security settings.

Another example of deep observational data, was the use of mouse tracking on search result pages to detect how people search under anxiety conditions:

In the paper by Wei Sui and co-authors on Computational Creative Advertisements presented at the HumL workshop – they use in-home facial and video tracking to measure emotional response to ads by volunteers.

The final example was the use of FMRI scans to track brain activity of participants during web search tasks. All these examples provide amazing insights into how people use these technologies but as these sorts of methods are more broadly adopted, we need to make sure to adopt the kinds of safe-guards adopted by these researchers – e.g. consent, IRBs, anonymization.

Scientific Content as a Driver

It’s probably our bias but we saw a lot of work tackling scientific content. Probably because it’s both interesting and provides a number of challenges. For example, the best paper of the conference (HighLife) was about extracting n-ary relations for knowledge graph construction motivated by the need for such types of relations in creating biomedical knowledge graphs. The aforementioned work on tabular data often is motivated by the needs of research. Obviously SAVE-SD covered this in detail:

In the demo track, the etymo.io search engine was presented to summarize and visualization of scientific papers. Kuansan Wang at the BigNet workshop talked about Microsoft Academic Search and the difficulties and opportunities in processing so much scientific data.

IMG_0495.JPG

Paul gave a keynote at the same workshop also using science as the motivation for new methods for building out knowledge graphs. Slides below:

In the panel, Structured Data on the Web 7.0, Google’s Evgeniy Gabrilovich – creator of the Knowledge Vote – noted the challenges of getting highly correct data for Google’s Medical Knowledge graph and that doing this automatically is still difficult.

Finally, using DOIs for studying persistent identifier use over time on the Web.

Wrap-up

Overall, we had a fantastic web conference. Good research, good conversations and good food:

Random Thoughts

 

Last week, I was in Japan for the 15th International Semantic Web Conference. 

For me, this was a big event as I was research track program co-chair together with the amazing Elena Simperl. Being a program chair is a funny thing, you’re not directly responsible for any individual paper, presentation or review but you feel responsible for the entirety. And obviously, organizing 664 reviews for 212 submissions isn’t something to be taken lightly. Beyond my service as research track chair, I think my main contribution was finding good coffee near the event:

With all that said, I think the entire program was really solid. All the preprints are on the website and the proceedings are available from Springer. I’ll try to summarize my main takeaways below. But first some quick stats:

  • 430 participants
  • 212 (research track) + 43 (application track) + 71 (resources track) = 326 submissions
    • that’s up by 61 submission from last year!
  • Acceptance rates:
    • 39/212  =  18% (research track)
    • 12/43 = 28% (application track)
    • 24/71 = 34%  (resources track)
    • I think these reflect the aims of the individual tracks
  • We also had 102 posters and demos and 12 journal track papers
  • 35 student travel winners

My three main takeaways:

  1. Frames are back!
  2. semantics on the web (notice the case)
  3. Science as the next challenge
  4. SPARQL as a driver for other CS communities

(Oh and apologies for the gratuitous use of images and twitter embeds)

Frames are back!

For the past couple of years, a chunk of the community has been focused on the problem of entity resolution/disambiguation whether that’s from text to a KB or across multiple KBs. Indeed, one of the best paper winners (yes, we gave out two – both nominees had great papers) by ISI’s Information Integration Group was an excellent approach to do multi-type entity resolution.  Likewise, Axel and crew gave a pretty heavy duty tutorial on link discovery. On the NLP front, Stefano Faralli presented a nice resource that disambiguates text to lexical resources with a focus on providing both symbolic and distributional representations .

2016-10-21 10.32.59.jpg2016-10-21 10.34.59.jpg

What struck me at the conference were the number of papers beginning to think not just about entities and their relations but the context they are in. This need for context was well motivated by the folks at IBM research working on medical question answering.

Essentially, thinking about classic AI frames but how do obtain these automatically. A clear example of this is the (ongoing) work on FRED:

Similarly, the News Reader system for extracting information into situated events is another example. Another example is extracting process graphs from medical texts. Finally, in the NLP community there’s an increasing focus on developing resources in order to build automated parsers for frame-style semantic representations (e.g. Abstract Meaning Representation). Such representations can be enhanced by connections to semantic web resources as discussed by Burns et al. (I knew this was a great idea in 2015!)

2016-10-21 10.58.15.jpg

In summary,  I think we’re beginning to see how the background knowledge available on the Semantic Web combined with better parsers can help us start to deal better with context in an automated fashion.

semantics on the web

Chris Bizer gave an insightful keynote reflecting on what the community’s expectations were for the semantic web and where we currently are at.

He presented stats on the growth of Linked Data (e.g. stuff in the LOD cloud) as well as web data (e.g. schema.org marked pages) but really the main take away is the boom in the later. About 30% of the Web has html embedded data something like 12 million websites.  There’s an 86% adoption rate on top travel website.  I think the choice quote was:

“Probably, every hotel on earth is represented as web data.”

The problem is that this sort of data is not clean, it’s messy – it’s webby data, which brings to Chris’s important point for the community:

2016-10-20-09-46-12

While standards have brought us a lot, I think we are starting as a research community to think increasingly about different kinds of semantics and different kinds of structured data.  Some examples from the conference:

An embrace of the whole spectrum of semantics on the web is really a valuable move for the research community. Interestingly enough, I think we can truly experiment with web data through things like Common Crawl and the Web Data Commons. As knowledge graphs, triple stores, and ontologies become increasingly common place especially in enterprise deployments, I’m heartened by these new areas of investigation.

The next challenge: Science

Personally, the third keynote of ISWC by Professor Hiroaki Kitano – the CEO of Sony CSL and creator among other things of the AIBO and founder of RoboCup gave a inspirational speech laying out what he sees as the next AI grand challenge:

2016-10-21 09.20.30.jpg

It will be hard for me to do justice to the keynote as the material per second ratio was pretty much off the chart but he has AI magazine article laying out the vision.

Broadly, he used RoboCup as a framework for discussing how to organize a challenge and pointed to its effectiveness. (e.g Kiva systems a RoboCup spinout was acquired by Amazon for $770 million). He then focused on the issue of the inefficiency in scientific discovery and in particular how assembling knowledge is just too difficult.

:2016-10-21 09.21.04.jpg

2016-10-21 09.40.53 copy.jpg

Assembling this by hand is way too hard!

He then went on to reframe the scientific question as one of a massive search and verification of hypothesis space. 2016-10-21 09.52.29.jpg

I walked out of that keynote pretty charged up.

I think the semantic web community can be a big part of tackling this grand challenge. Science and medicine have always been important domains for applying these technologies and that showed up at this conference as well:

SPARQL as a driver for other CS communities

The 10 year award was given to  Jorge Perez , Marcelo Arenas and Claudio Gutierrez for their paper Semantics and Complexity of SPARQL. Jorge gave just a beautiful 10 minute reflection on the paper and the relationship between theory and practice. I think his slide below really sums up the impact that SPARQL has had not just on the semantic web community but CS as a whole:

2016-10-19 09.35.39.jpg

As further evidence, I thought one of the best technical talks of the conference (even through an earthquake) was by Peter Bonz on emergent schemas for RDF querying.

It was a clear example of how the two DB and semweb communities are learning from one another and that by the semantic web having different requirements (e.g. around schemas), this drives new research.

As a whole, it’s hard to beat a conference where you learn a ton and has the following:

2016-10-19 10.52.15.jpg

Random Pointers

It’s kind of appropriate that my last post of 2015 was about the International Semantic Web Conference (ISWC) and my first post of 2016 will be about ISWC.

This years conference will be held in Kobe Japan. This year’s conference already has a number of great things in store. We already have a stellar list of keynote speakers:

  • Kathleen McKeown – Professor of Computer Science at Columbia University,
    Director of the Institute for Data Sciences and Engineering, and Director of the North East Big Data Hub. I was at the hub’s launch last year and it’s really amazing the researchers she brought together through that hub.
  • Hiroaki Kitano – CEO of Sony Computer Science Laboratory and President of the systems biology institute. A truly inspirational figure who done everything from RoboCup to systems biology. He was even an invited artist at MoMA.
  • Chris Bizer – Professor at the Univesity of Mannheim  and Director of the Institute of Computer Science and Business Informatics there. If you’re in the Semantic Web community – you know the amazing work Chris has done. He really kicked the entire move toward Linked Data into high gear.

We have three tracks for you to submit to:

  1. The classic Research Track. Elena and I hope to get your most innovative and groundbreaking work on the cross between semantics and the web writ large. We’ve put together a top notch PC to give you feedback.
  2. The Resources Tracks. Reusable resources like datasets, ontologies, benchmarks and tools are crucial for many research disciplines and especially ours. This track focuses on highlighting them. Alasdair and Marta have put together a rich set of guidelines for a great reusable resources. Check them out.
  3. The Applications Track provides an area to discuss the benefits and challenges of applying semantic technologies. This track, organized by Markus and Freddy, is accepting three different types of submissions on in-use applications, industry applications and industry applications.

In addition to these tracks, ISWC 2016 will have a full program of workshops, posters, demos and student opportunities.

This year we’ll also be allowing submissions to be HTML, letting you experiment with new ways of conveying your contributions. I’m excited to see the creativity in the community using web technologies.

So get those submissions in. Abstracts are due April 20, Full submissions April 30th!

 

 

 

 

%d bloggers like this: