supply chains

Last week, I had the pleasure to be able to attend a bilateral meeting between the Royal Society and the KNAW. The aim was to strengthen the relation between the UK and Dutch scientific communities. The meeting focused on three scientific areas: quantum physics & technology; nanochemistry; and responsible data science. I was there for the latter. The event was held at Chicheley Hall which is a classic baroque English country house (think Pride & Prejudice). It’s a marvelous venue – very much similar in concept to Dagstuhl (but with an English vibe) where you are really wholly immersed in academic conversation.


One of the fun things about the event was getting a glimpse of what other colleagues from other technical disciplines are doing. It was cool to see Prof. Bert Weckhuysen enthusiasm for using imaging technologies to understand catalysts at the nanoscale. Likewise, seeing both the progress and the investment (!) in quantum computing from Prof. Ian Walmsley was informative. I also got an insider intro to the challenges of engineering a quantum computer from Dr. Ruth Oulton.

The responsible data science track had ~15 people. What I liked was that the organizers not only included computer scientists but also legal scholars, politicians, social scientists, philosophers and policy makers. The session consisted primarily of talks but luckily everyone was open to discussion throughout. Broadly, responsible data science covers the ethics of the practice and implications of data science or put another way:

For more context, I suggest starting with two sources: 1) The Dutch consortium on responsible data science 2) the paper 10 Simple Rules for Responsible Big Data Research. I took away two themes both from the track as well as my various chats with people during coffee breaks, dinner and the bar.

1) The computer science community is engaging

It was apparent through out the meeting that the computer science community is confronting the challenges head on. A compelling example was the talk by Dr. Alastair Beresford from Cambridge about Device Analyzer a system that captures the activity of user’s mobile phones in order to provide data to improve device security, which it has:

He talked compellingly about the trade-offs between consent and privacy and how the project tries to manage these issues. In particular, I thought how they handle data sharing with other researchers was interesting. It reminded me very much of how the Dutch Central Bureau of Statistics manages microdata on populations.

Another example was the discussion by Prof. Maarten De Rijke on the work going on with diversity for recommender and search systems. He called out the Conference on Fairness, Accountability, and Transparency (FAT*) that was happening just after this meeting, where the data science community is engaging on these issues. Indeed, one of my colleagues was tweeting from that meeting:

Julian Huppert, former MP, discussed the independent review board setup up by DeepMind Health to enable transparency about their practices. He is part of that board.  Interestingly, Richard Horton, Editor of the Lancet is also part of that board Furthermore, Prof. Bart Jacobs discussed the polymorphic encryption based privacy system he’s developing for a collaboration between Google’s Verily and Radboud University around Parkinson’s disease. This is an example that  even the majors are engaged around these notions of responsibility. To emphasize this engagement notion even more, during the meeting a new report on the Malicious Uses of AI came out from a number or well-known organizations.

One thing that I kept thinking is that we need more assets or concrete artifacts that data scientists can apply in practice.

For example, I like the direction outlined in this article from Dr. Virginia Dignum about defining concrete principles using a design for values based approach. See TU Delft’s Design for Values Institute for more on this kind of approach.

2) Other methods needed

As data scientists, we tend to want to use an experimental / data driven approach even to these notions surrounding responsibility.

Even though I think there’s absolutely a role here for a data driven approach, it’s worth looking at other kinds of more qualitative methods, for example, by using survey instruments or an ethnographic approach or even studying the textual representation of the regulatory apparatus.  For instance, reflecting on the notion of Thick Data is compelling for data science practice. This was brought home by Dr. Ian Brown in his talk on data science and regulation which combined both an economic and survey view:

Personally, I tried to bring some social science literature to bear when discussing the need for transparency in how we source our data. I also argued for the idea that adopting a responsible approach is also actually good for the operational side of data science practice:

While I think it’s important for computer scientists to look at different methods, it’s also important for other disciplines to gain insight into the actual process of data science itself as Dr. Linnet Taylor grappled within in her talk about observing a data governance project.

Overall, I enjoyed both the setting and the content of the meeting. If we can continue to have these sorts of conversations, I think the data science field will be much better placed to deal with the ethical and other implications of our technology.

Random Thoughts

  • Peacocks!
  • Regulating Code – something for the reading list
  • Somebody remind me to bring a jacket next time I go to an English Country house!
  • I always love it when egg codes get brought up when talking about provenance.
  • I was told that I had a “Californian conceptualization” of things – I don’t think it was meant as a complement – but I’ll take it as such 🙂
  • Interesting pointer to work by Seda Gurses about in privacy and software engineering from @1Br0wn
  • Lots of discussion of large internet majors and monopolies. There’s lots of academic work on this but I really like Ben Thompson’s notion of aggregator’s as the way to think about them.
  • Merkle trees are great – but blockchain is a nicer name 😉


coffe from the worldThe rise of Fair Trade food and other products has been amazing over the past 4 years. Indeed, it’s great to see how certification for the origins (and production processes) of products  is becoming both prevalent and expected. For me, it’s nice to know where my morning coffee was grown and indeed knowing that lets me figure out the quality of the coffee (is it single origin or a blend?).

I now think it’s time that we do the same for data. As we work in environments where our data is aggregated from multiple sources and processed along complex digital supply chains, we need the same sort of “fair trade” style certificate for our data. I want to know that my data was grown and nurtured and treated with care and it would be great to have a stamp that lets me understand that with a glance without having to a lot of complex digging.

In a just published commentary in IEEE Internet Computing, I go into a bit more detail about how provenance and linked data technologies are laying the ground work for fair trade data. Take a look and let me know what you think.



While exploring the London Science Museum, I saw this great exhibit for the Toaster Project. The idea was to try to build a modern day toaster from scratch. There’s a video describing the project below and more info about the project from the site linked above.  What was interesting was that to get some information about how things were produced, Thomas Thwaites had to go look in some pretty old books to see how things get produced. I think it would be cool to make it easy to link  every product in my house to how to produce it (or how it was created) without going through a 9 month process to figure it out.

Should you be responsible for the safety of your food? The article, Food Companies Are Placing the Onus for Safety on Consumers, in the New York Times is scary. The fundamental point is that it’s extremely difficult for companies that make ready-made frozen meals to verify the safety of their food  because the supply chains have gotten so complex and they cannot track the provenance of the ingredients. Furthermore, the manufactures have resisted putting in place tracking systems. From the article:

But government efforts to impose tougher trace-back requirements for ingredients have met with resistance from food industry groups including the Grocery Manufacturers Association, which complained to the Food and Drug Administration: “This information is not reasonably needed and it is often not practical or possible to provide it.

Instead of instituting a track back mechanism, the manufactures are trying to get consumers to ensure they cook their meals safe, reaching a “kill-step” where bacteria is destroyed. However, as discussed in the article this is actually very hard to do for some meals. 

Personally, I’m going to lay off ready made meals, which is unfortunate because they do come in handy. Generally, I want to know about provenance even if I can destroy all the bacteria with a smoking microwave. Additionally, I wonder how we can get the computer science research products we have been doing in the CS Provenance Community into the hands of these manufacturers. I really believe the collecting and managing the kind of the documentation they need can be significantly cheaper and more effective than they expect using our technology.

Check out further discussion at the New York Times’ Room For Debate blog.


After my post on Saturday about chinese food, merpel pointed out to me that the new issue of Wired has an entire issue devoted to the future of food. Of particular interest is the infographic shown above The Global Menu: Food from Afar that displays the distance that food has to travel to get to Iowa, which is some how apropos because I have roots there. It’s just amazing to me that in a place with such great soil where anything can grow apples are shipped in from 1726 miles away! I mean my grandmother who lives in Iowa has an orchard out back so its definitely possible….

Food is obviously a great example of provenance and it’s something I’ll be coming back to frequently, particularly, because it’s probably the example that gets people thinking the most… “how was the frozen meal I just purchased made?”

Here in California, I think it’s one of the reasons that Proposition 2 (a ballot initiative to regulate the confinement of animals) has a good chance of passing (72% in favor: Survey USA poll Sept.). People are beginning to care about how their food is produced and in this case how animals are treated before they make it to the dinner plate. There was a great piece in this past Sunday’s New York Times Sunday Magazine (The Barnyard Strategist) about Proposition 2 and the man behind the initiative, Wayne Pacelle, the director of the Humane Society.

Steven Shaw has an op-ed in the NY Times about the extremely poor working conditions and pay for chinese food delivery in New York. I haven’t had takeout chinese delivered since I’ve moved to LA but I’m pretty sure its the same here. It’s interesting to note that it’s difficult for consumers to be aware of what the true price of convenience is.

I just saw an interesting post in the New York Times’ Green Inc. blog about how an Indian state controlled firm (with a consortium of other investors) is trying to acquire coal mines in appalachia. It’s a powerful example of how the software delivered from Bangalore may be powered by coal from West Virginia. It brings up the interesting question of how can individuals lock in the sources of their stuff without the purchasing power of a corporation.

%d bloggers like this: