Current ways of measuring scientific impact are rather course grained, they often don’t capture the many different ways that science and scientists might have impact. As science increasingly is done on-line and in the open, new metrics are being created to help measure this impact. Jason Priem, Dario Taraborelli, myself, and Cameron Neylon have recently put out a manifesto calling out-lining a research direction for these new metrics, termed alt-metrics.
I wrote a post a while back around the idea of Data DJs: how do we make it as easy to mix data as it is to mix music. This notion requires advances on several fronts from data and knowledge integration, to user interfaces, along with data provenance and semantics. Most of the research I do then somehow relates to this Data DJ’s in some form or anther.
However, I always thought I it would be fun to push the analogy as far as I could. Last Christmas, I got a DJ deck (specifically a Numark Stealth Control-fantastic name, right?) with the idea of actually using it to mix data sets. For a host of reasons, including time but also a lack of a clear vision of what an integration interface should look like, I never got past just toying around with it. However, over the past couple of weekends I found time to revisit it and develop a super alpha version of a data integration system using the deck. Here’s a video to see what I’ve done, read on to get more details.
What really got me going was the notion that events (or who, what, when, where and why) are a perfect substrate for data integration. This is not my idea but has been something I’ve been hearing from a number of sources including from a number of people in the VU’s Web and Media Group down the hall, Raphaël Troncy, and probably best summed up by Mor Naaman. With this as inspiration, I developed a preliminary interface around integrating/and summarizing events (well actually tweets, but hopefully this will expand to other event sources) that you saw in the video above. The components of the interface (shown in the picture below) are as follows:
On the top is a list of the search terms that were used to retrieve the tweets. The tweets for each search term can be hidden and unhidden.
On the right is a list of the users (i.e. sources) who made the tweets. Each source can be filtered in and out impacting the term summary graph
In the middle are all the tweets on the same timeline.
On the right, is a bar graph that summarizes the most common terms across the tweets.
Below the bar graph, is the time span of the tweets and the current time of the selected tweet.
On the far right are hashtags that are selected by the user.
As you saw in the video it’s pretty fast to scroll through both sources and tweets. With a quick flick it’s easy to apply a filter and pretty natural to select and deselect search terms. Furthermore, we can easily delete tweets and data sources with the push of a button. There’s still much much more to be done to make this a viable user interface for the kind of data mixing task we want to support. But standing in front of the projector today scrolling through tweets, eliminating sources and seeing an overview fly-up really convinced me that this type of interaction is really suited to the data integration task. That being said any advice or comments on the interface would be greatly appreciated. In particular, suggestions for good infographics pertaining to events would be appreciated.
The interface was completely implemented using HTML5. In particular, I used the nice ProtoVis framework along with JQuery and JQuery Tools. To get the fast updates from the deck, we use WebSockets. I have a small Java program reading midi off the deck which then acts as a socket server for WebSockets and pipes the midi signals (after translation to JSON) to the connected sockets. I’ve been using Google Chrome for development so I don’t know how it works in other browsers. To get data, we use the search interface of twitter and JSONP. In general, I was very impressed with what you can do in the browser. I felt like I wasn’t even pushing the capabilities especially since I don’t do web programming everyday.
Lots! This was really just a proof of concept. There’s a bunch of directions to go in: improved graphics, better use of the decks, social interaction around integration (two djs at once!), more data sources beyond twitter, experiments on task performance, live mixing of an event…. If you have any ideas, suggestions, or comments, I’d love to hear them.