data dj

Last week, I attended ACM CHI 2013 and Web Science 2013 in Paris. I had a great time and wanted to give a recap of both conferences, which were collocated.


2013-04-29 18.45.58

This was my first time at CHI – the main computer-human interaction conference. It’s not my main field of study but I was there to Data DJ. I had an interactivity submission accepted with Ayman from Yahoo! Reseach on using turntables to manipulate data. Here’s the abstract:

Spinning Data: Remixing live data like a music DJ

This demonstration investigates data visualization as a performance through the use of disc jockey (DJs) mixing boards. We assert that the tools DJs use in-situ can deeply inform the creation of data mixing interfaces and performances. We present a prototype system, DMix, which allows one to filter and summarize information from social streams using a audio mixing deck. It enables the Data DJ to distill multiple feeds of information in order to give an overview of a live event.

Paul Groth and David A. Shamma. 2013. Spinning data: remixing live data like a music dj. In CHI ’13 Extended Abstracts on Human Factors in Computing Systems (CHI EA ’13). ACM, New York, NY, USA, 3063-3066. DOI=10.1145/2468356.2479611 (PDF)

It was a fun experience… although it was a lot of demo giving (reception + all coffee breaks). The reactions were really positive. Essentially, once a person touched the deck they really got the interaction. Plus, a couple of notable people stopped by that seemed to like the interaction: Jacob Nielsen and @kristw from twitter data science. The kind of response I got made me really want to pursue the project more. I also learned about how we can make the interaction better.

The whole prototype system is available on github. I wrote the whole using node.js and javascript in a web browser.  Warning: this is very ugly code.

In addition to my demo, I was impressed with the cool stuff on display (e.g. traceable skateboards) as well as the number of companies there looking for talent. The conference itself was huge with 3500 people and it was the first conference I attended where they had multiple sponsored parties.


Web Science was after CHI and is more in my area of research.

What we presented

2013-05-03 15.16.18

I was pleased that the VU had 8 publications at the conference, which is a really strong showing. Also two of our papers were nominated for the best paper award.

The two papers I had in the conference were very interdisciplinary.

These papers were chiefly done by the first authors both students at the VU. Anca attended Web Science and did a great job presenting our poster on using Google Scholar to measure academic independence. There was a lot of interest and we got quite a few ideas on how to improve the paper (bigger sample!).

The other paper by Fabian Eikelboom was very well received. It compared online and offline pray cards and tried to see how the web modified this form of communication. Here’s a couple of tweets:

Conference thoughts

I found quite a few things that I really liked at this year’s web science. A couple of pointers:

  • Henry S Thompson, Jonathan A Rees and Jeni Tennison: URIs in data: for entities, or for descriptions of entities: A critical analysis – Talked about the http range 14 and the problem of unintended extensibility points within standards. I think a critical area of Web Science is how the social construction of technical standards impacts the Web and its development. This is an example of this kind of research.
  • Catherine C. Marshall and Frank M. Shipman: Experiences Surveying the Crowd: Reflections on methods, participation, and reliability – really got me thinking about the notion of hypotheticals in law and how this relates to provenance on the web.
  • Panagiotis Metaxas and Eni Mustafaraj: The Rise and the Fall of a Citizen Reporter – a compelling example of how twitter influences the mexican drug war and how trust is difficult to determine online. The subsequent Trust Trails project looks interesting.
  • The folks over at the UvA at are doing a lot of fun work with respect to studying the web as a social object. It’s worth looking at their work.
  • Jérôme Kunegis, Marcel Blattner and Christine Moser. Preferential Attachment in Online Networks: Measurement and Explanations – interesting discussion of how good our standard network models are.  Check out there collection of networks to download and analyze!
  • Sebastien Heymann and Benedicte Le Grand. Towards A Redefinition of Time in Information Networks?

Unfortunately, there were some things that I hope will improve for next year. First, as you can tell above the papers were not available online during the conference. This is really a bummer when your trying to tweet about things you see and follow-up later. Secondly, I thought there were a few too many philosophy papers. In particular, it worries me when a computer scientist is presenting a philosophy paper at a science conference. I think the program committee needs to watch out for spreading too thinly in the name of interdisciplinarity. Finally, the pecha kucha session was a real  success – short, succinct presentations that really raised interest in the work. This, however, didn’t carry over into the main sessions which often ran too long.

Overall, both CHI and Web Science were well worth the time – I made a bunch of connections and saw some good research that will influence some of my work. Oh and it turns out Paris has some amazing coffee:

2013-05-03 10.37.29

I wrote a post a while back around the idea of Data DJs: how do we make it as easy to mix data as it is to mix music. This notion requires advances on several fronts from data and knowledge integration, to user interfaces, along with data provenance and semantics. Most of the research I do then somehow relates to this Data DJ’s in some form or anther.

However, I always thought I it would be fun to push the analogy as far as I could. Last Christmas, I got a DJ deck (specifically a Numark Stealth Control-fantastic name, right?) with the idea of actually using it to mix data sets. For a host of reasons, including time but also a lack of a clear vision of what an integration interface should look like, I never got past just toying around with it. However, over the past couple of weekends I found time to revisit it and develop a super alpha version of a data integration system using the deck. Here’s a video to see what I’ve done, read on to get more details.

What really got me going was the notion that events (or who, what, when, where and why) are a perfect substrate for data integration. This is not my idea but has been something I’ve been hearing from a number of sources including from a number of people in the VU’s Web and Media Group down the hall, Raphaël Troncy, and probably best summed up by Mor Naaman. With this as inspiration, I developed a preliminary interface around integrating/and summarizing events (well actually tweets, but hopefully this will expand to other event sources) that you saw in the video above. The components of the interface (shown in the picture below) are as follows:

  • On the top is a list of the search terms that were used to retrieve the tweets. The tweets for each search term can be hidden and unhidden.
  • On the right is a list of the users (i.e. sources) who made the tweets. Each source can be filtered in and out impacting the term summary graph
  • In the middle are all the tweets on the same timeline.
  • On the right, is a bar graph that summarizes the most common terms across the tweets.
  • Below the bar graph, is the time span of the tweets and the current time of the selected tweet.
  • On the far right are hashtags that are selected by the user.

As you saw in the video it’s pretty fast to scroll through both sources and tweets. With a quick flick it’s easy to apply a filter and pretty natural to select and deselect search terms. Furthermore, we can easily delete tweets and data sources with the push of a button. There’s still much much more to be done to make this a viable user interface for the kind of data mixing task we want to support. But standing in front of the projector today scrolling through tweets, eliminating sources and seeing an overview fly-up really convinced me that this type of interaction is really suited to the data integration task. That being said any advice or comments on the interface would be greatly appreciated. In particular, suggestions for good infographics pertaining to events would be appreciated.

Technical Details:

The interface was completely implemented using HTML5. In particular, I used the nice ProtoVis framework along with JQuery and JQuery Tools. To get the fast updates from the deck, we use WebSockets. I have a small Java program reading midi off the deck which then acts as a socket server for WebSockets and pipes the midi signals (after translation to JSON) to the connected sockets. I’ve been using Google Chrome for development so I don’t know how it works in other browsers. To get data, we use the search interface of twitter and JSONP. In general, I was very impressed with what you can do in the browser. I felt like I wasn’t even pushing the capabilities especially since I don’t do web programming everyday.

What’s next?

Lots! This was really just a proof of concept. There’s a bunch of directions to go in: improved graphics, better use of the decks, social interaction around integration (two djs at once!), more data sources beyond twitter, experiments on task performance, live mixing of an event…. If you have any ideas, suggestions, or comments, I’d love to hear them.

How do you want to data DJ?

%d bloggers like this: