I had the pleasure of attending the Web Conference 2018 in Lyon last week along with my colleague Corey Harper . This is the 27th addition of the largest conference on the World Wide Web. I have tremendous difficulty not calling it WWW but I’ll learn! Instead of doing two trip reports the rest of this is a combo of Corey and my thoughts. Before getting to what we took away as main themes of the conference let’s look at the stats and organization:
It’s also worth pointing out that this is just the research track. There were 27 workshops, 21 tutorials, 30 demos (Paul was co-chair), 62 posters, four collocated conferences/events, 4 challenges, a developer track and programming track, a project track, an industry track, and… We are probably missing something as well. Suffice to say, even with the best work of the organizers it was hard to figure out what to see. Organizing an event with 2200+ attendees is a thing is a massive task – over 80 chairs were involved not to mention the PC and the local heavy lifting. Congrats to Fabien, Pierre-Antoine, Lionel and the whole committee for pulling it off. It’s also great to see as well that the proceedings are open access and available on the web.
Given the breadth of the conference, we obviously couldn’t see everything but from our interests we pulled out the following themes:
- Dealing with a Polluted Web
- Tackling Tabular Data
- Observational Methods
- Scientific Content as a Driver
Dealing with a Polluted Web
The Web community is really owning it’s responsibility to help mitigate the destructive uses to which the Web is put. From the “Recoding Black Mirror” workshop, which we were sad to miss, through the opening keynote and the tracks on Security and Privacy and Fact Checking, this was a major topic throughout the conference.
Oxford professor Luciano Floridi gave an excellent first keynote on “The Good Web” which addressed this topic head on. He introduced a number of nice metaphors to describe what’s going on:
- Polluting agents in the Web ecosystem are like extremphiles, making the environment hostile to all but themselves
- Democracy in some contexts can be like antibiotics: too much gives growth to antibiotic resistant bacteria.
- His takeaway is that we need a bit of paternalism in this context now.
His talk was pretty compelling, you can check out the full video here.
Additionally, Corey was able to attend the panel discussion that opened the “Journalism, Misinformation, and Fact-Checking” track, which included representation from the Credibility Coalition, the International Fact Checking Network, MIT, and WikiMedia. There was a discussion of how to set up economies of trust in the age of attention economies, and while some panelists agreed with Floridi’s call for some paternalism, there was also a warning that some techniques we might deploy to mitigate these risks could lead to “accidental authoritarianism.” The Credibility Coalition also provided an interesting review of how to define credibility indicators for news looking at over 16 indicators of credibility.
We were able to see parts of the “Web and Society track”, which included a number of papers related to social justice oriented themes. This included an excellent paper that showed how recommender systems in social networks often exacerbate and amplify gender and racial disparity in social network connections and engagement. Additionally, many papers addressed the relationship between the mainstream media and the web. (e.g. political polarization and social media, media and public attention using the web).
Some more examples: The best demo was awarded to a system that automatically analyzed privacy policies of websites and summarized them with respect to GDPR and:
More generally, it seems the question is how do we achieve quality assessment at scale?
Tackling Tabular Data
Knowledge graphs and heterogenous networks (there was a workshop on that) were a big part of the conference. Indeed the test of time paper award went to the original Yago paper. There were a number of talks about improving knowledge graphs for example for improving on question answering tasks, determining attributes that are needed to complete a KG or improving relation extraction. While tables have always been an input to knowledge graph construction (e.g. wikpedia infoboxes), an interesting turn was towards treating tabular data as a focus area.
As Natasha Noy from Google noted in her keynote at the SAVE-SD workshop, this is an area with a number of exciting research challenges:
There was a workshop on data search with a number of papers on the theme. In that workshop, Maarten de Rijke gave a keynote on the work his team has been doing in the context of data search project with Elsevier.
In the main track, there was an excellent talk on Ad-Hoc Table Retrieval using Semantic Similarity. They looked at finding semantically central columns to provide a rank list of columns. More broadly they are looking at spreadsheet compilation as the task (see smarttables.cc and the dataset for that task.) Furthermore, the paper Towards Annotating Relational Data on the Web with Language Models looked at enriching tables through linking into a knowledge graph.
Observing user behavior has been a part of research on the Web, any web search engine is driven by that notion. What did seem to be striking is the depth of the observational data being employed. Prof. Lorrie Cranor gave an excellent keynote on the user experience of web security (video here). Did you know that if you read all the privacy policies of all the sites you visit it wold take 244 hours per year? Also, the idea of privacy as nutrition labels is pretty cool:
But what was interesting was her labs use of an observatory of 200 participants who allowed their Windows home computers to be instrumented. This kind of instrumentation gives deep insight into how users actually use their browsers and security settings.
Another example of deep observational data, was the use of mouse tracking on search result pages to detect how people search under anxiety conditions:
In the paper by Wei Sui and co-authors on Computational Creative Advertisements presented at the HumL workshop – they use in-home facial and video tracking to measure emotional response to ads by volunteers.
The final example was the use of FMRI scans to track brain activity of participants during web search tasks. All these examples provide amazing insights into how people use these technologies but as these sorts of methods are more broadly adopted, we need to make sure to adopt the kinds of safe-guards adopted by these researchers – e.g. consent, IRBs, anonymization.
Scientific Content as a Driver
It’s probably our bias but we saw a lot of work tackling scientific content. Probably because it’s both interesting and provides a number of challenges. For example, the best paper of the conference (HighLife) was about extracting n-ary relations for knowledge graph construction motivated by the need for such types of relations in creating biomedical knowledge graphs. The aforementioned work on tabular data often is motivated by the needs of research. Obviously SAVE-SD covered this in detail:
- Researchers at the University of Bologna who are working with Elsevier, building an XML processing pipeline to convert a small set of Chemical Engineering articles to RDF, and capture citation motivations using the CITO Ontology.
- A talk on using Springer Natures SciGraph to extract author affiliations for major web conferences, and do a longitudinal study of geographical trends. They found that, for ISWC, ESWC, and TPDL, 20% of countries produce 80% of publications, and there is very little turnover.
- Among Corey’s favorites was a presentation on a Web Application for creating and sharing visual bibliographies. VisualBib is a compelling use case for visualization of citations and cited works over time.
- Also, at SAVE-SD was Lab’s own work on extracting discourse segment types in scientific text.
In the demo track, the etymo.io search engine was presented to summarize and visualization of scientific papers. Kuansan Wang at the BigNet workshop talked about Microsoft Academic Search and the difficulties and opportunities in processing so much scientific data.
Paul gave a keynote at the same workshop also using science as the motivation for new methods for building out knowledge graphs. Slides below:
In the panel, Structured Data on the Web 7.0, Google’s Evgeniy Gabrilovich – creator of the Knowledge Vote – noted the challenges of getting highly correct data for Google’s Medical Knowledge graph and that doing this automatically is still difficult.
Finally, using DOIs for studying persistent identifier use over time on the Web.
Overall, we had a fantastic web conference. Good research, good conversations and good food:
- We’re getting RDF.js!
- More tools: SPARQL query results direct to JSON-LD
- Prior trip reports of Paul’s from WWW 2015 and 2010.
- A cool tutorial on representation learning for networks from Stanford. – Another thing we wanted to sit in but couldn’t.
- Abstract state machines and linked-data-fu used for work task analysis in VR.
- Paul panelling.
- A knowledge graph of quantities.
- This is a great example about how to present research work: presentation, abstract, pdf, data, visual all in one place. It’s also a cool paper…
- It’s cool randomly bumping into KMI folks on a Sunday night…those KMI folks…
- Wikidata provides RDF.
- Markus: public sparql is scalable see Wikidata
- Web Audio – really fun demo (+slides)(+more demo)
- Schlobach on Semantic Commitment
- I wish Aidan Hogan’s talk on data driven schemas was online – really clear
- Semantics and Complexity of GraphQL – important work by Olaf and Jorge – love how they do practice -> theory -> practice
- A unified view of complexity.
- The job fair is a good idea – but seems to be more people with badges “I’m hiring” than “I’m looking”.
- Paul recommends Amelie as a co-chair.