This was my first time at CHI – the main computer-human interaction conference. It’s not my main field of study but I was there to Data DJ. I had an interactivity submission accepted with Ayman from Yahoo! Reseach on using turntables to manipulate data. Here’s the abstract:
Spinning Data: Remixing live data like a music DJ
This demonstration investigates data visualization as a performance through the use of disc jockey (DJs) mixing boards. We assert that the tools DJs use in-situ can deeply inform the creation of data mixing interfaces and performances. We present a prototype system, DMix, which allows one to filter and summarize information from social streams using a audio mixing deck. It enables the Data DJ to distill multiple feeds of information in order to give an overview of a live event.
Paul Groth and David A. Shamma. 2013. Spinning data: remixing live data like a music dj. In CHI ’13 Extended Abstracts on Human Factors in Computing Systems (CHI EA ’13). ACM, New York, NY, USA, 3063-3066. DOI=10.1145/2468356.2479611 http://doi.acm.org/10.1145/2468356.2479611 (PDF)
It was a fun experience… although it was a lot of demo giving (reception + all coffee breaks). The reactions were really positive. Essentially, once a person touched the deck they really got the interaction. Plus, a couple of notable people stopped by that seemed to like the interaction: Jacob Nielsen and @kristw from twitter data science. The kind of response I got made me really want to pursue the project more. I also learned about how we can make the interaction better.
In addition to my demo, I was impressed with the cool stuff on display (e.g. traceable skateboards) as well as the number of companies there looking for talent. The conference itself was huge with 3500 people and it was the first conference I attended where they had multiple sponsored parties.
Web Science was after CHI and is more in my area of research.
What we presented
The two papers I had in the conference were very interdisciplinary.
- Anca Dumitrache, Paul Groth, Peter van den Besselaar (2013) Identifying Research Talent Using Web-Centric Databases. In Web Science 2013. PDF
These papers were chiefly done by the first authors both students at the VU. Anca attended Web Science and did a great job presenting our poster on using Google Scholar to measure academic independence. There was a lot of interest and we got quite a few ideas on how to improve the paper (bigger sample!).
The other paper by Fabian Eikelboom was very well received. It compared online and offline pray cards and tried to see how the web modified this form of communication. Here’s a couple of tweets:
I found quite a few things that I really liked at this year’s web science. A couple of pointers:
- Henry S Thompson, Jonathan A Rees and Jeni Tennison: URIs in data: for entities, or for descriptions of entities: A critical analysis – Talked about the http range 14 and the problem of unintended extensibility points within standards. I think a critical area of Web Science is how the social construction of technical standards impacts the Web and its development. This is an example of this kind of research.
- Catherine C. Marshall and Frank M. Shipman: Experiences Surveying the Crowd: Reflections on methods, participation, and reliability - really got me thinking about the notion of hypotheticals in law and how this relates to provenance on the web.
- Panagiotis Metaxas and Eni Mustafaraj: The Rise and the Fall of a Citizen Reporter – a compelling example of how twitter influences the mexican drug war and how trust is difficult to determine online. The subsequent Trust Trails project looks interesting.
- The folks over at the UvA at digitalmethods.net are doing a lot of fun work with respect to studying the web as a social object. It’s worth looking at their work.
- Jérôme Kunegis, Marcel Blattner and Christine Moser. Preferential Attachment in Online Networks: Measurement and Explanations – interesting discussion of how good our standard network models are. Check out there collection of networks to download and analyze!
- Sebastien Heymann and Benedicte Le Grand. Towards A Redefinition of Time in Information Networks?
Unfortunately, there were some things that I hope will improve for next year. First, as you can tell above the papers were not available online during the conference. This is really a bummer when your trying to tweet about things you see and follow-up later. Secondly, I thought there were a few too many philosophy papers. In particular, it worries me when a computer scientist is presenting a philosophy paper at a science conference. I think the program committee needs to watch out for spreading too thinly in the name of interdisciplinarity. Finally, the pecha kucha session was a real success – short, succinct presentations that really raised interest in the work. This, however, didn’t carry over into the main sessions which often ran too long.
Overall, both CHI and Web Science were well worth the time – I made a bunch of connections and saw some good research that will influence some of my work. Oh and it turns out Paris has some amazing coffee:
For the past couple of days (April 8 – 10, 2013), I attended the UKSG conference. UKSG is organization for academic publishers and librarians. The conference itself has over 700 attendees and is focused on these two groups. I hadn’t heard of it until I was invited by Mike Taylor from Elsevier Labs to give a session with him on altmetrics.
The session was designed to both introduce and give a start-of-the art update on altmetrics to publishers and librarians. You can see what I had to say in the clip above but my main point was that altmetrics is at a stage where it can be advantageously used by scholars, projects and institutions not to rank but instead tell a story about their research. It’s particular important when many scientific artifacts beyond the article (e.g. data, posters, blog posts, videos) are becoming increasingly trackable and can help scholars tell their story.
The conference itself was really a bit weird for me as it was a completely different crowd than I normally would connect with… I had to one of the few “actual” academics there, which lead to my first day tweet:
being at #uksglive as an academic is interesting – talking to people who talk about me in the abstract is seriously meta
— Paul Groth (@pgroth) April 8, 2013
It was fun to randomly go up to the ACM and IEEE stand and introduce myself not as a librarian or another publisher but as an actual member of their organizations. Overall, though people were quite receptive of my comments and were keen to get my views on what publishing and librarians could be doing to help me as a research out. I do have to say that it was a fairly well funded operation (there is money in academia somewhere)…. I came away with a lot of free t-shirts and USB sticks and I never have been to a conference that had bumper cars for the evening entertainment:
In addition to (hopefully) contributing to the conference, I learned some things myself. Here are some bullet points in no particular order:
- Outrageous talk by @textfiles – the Archive Team is super important
- I talked a lot to Geoffrey Bilder from CrossRef. Topics included but not limited to:
- why and when indirection is important for permanence in url space
- the need for a claims (i.e. nanopublications) database referencing ORCID
- the need for consistent url policies on sites and a “living will” for sites of importance
- when will scientist get back to being scientists and stop being marketers (is this statement true, false, in-between, or is it even a bad thing)
- the coolness of labs.crossref.org
- It’s clear that librarians are the publishers customers, academics are second. I think this particular indirection badly impacts the market.
- Academic content output is situated in a network – why do we de-link it all the time?
- The open access puppy
— Mark Hahnel (@MarkHahnel) April 9, 2013
- It was interesting to see the business of academic publishing going done. There were lots of pretty intense looking dealings going down that I witnessed in the cafe.
- Bournemouth looks like it could have some nice surfing conditions.
Overall, UKSG was a good experience to see, from the inside, this completely other part of the academic complex.
The rise of Fair Trade food and other products has been amazing over the past 4 years. Indeed, it’s great to see how certification for the origins (and production processes) of products is becoming both prevalent and expected. For me, it’s nice to know where my morning coffee was grown and indeed knowing that lets me figure out the quality of the coffee (is it single origin or a blend?).
I now think it’s time that we do the same for data. As we work in environments where our data is aggregated from multiple sources and processed along complex digital supply chains, we need the same sort of “fair trade” style certificate for our data. I want to know that my data was grown and nurtured and treated with care and it would be great to have a stamp that lets me understand that with a glance without having to a lot of complex digging.
In a just published commentary in IEEE Internet Computing, I go into a bit more detail about how provenance and linked data technologies are laying the ground work for fair trade data. Take a look and let me know what you think.
You should go read Jason Preim‘s excellent commentary in Nature – Scholarship: Beyond the Paper but I wanted to call out a bit that I’ve talked about with a number of people and I think is important. We should be looking at how we build the best teams of scientists and not just looking for the single best individual:
Tenure and hiring committees will adapt, too, with growing urgency. Ultimately, science evaluation will become something that is done scientifically, exchanging arbitrary, biased, personal opinions for meaningful distillations of entire communities’ assessments. We can start to imagine the academic department as a sports team, full of complementary positions (theorists, methodologists, educators, public communicators, grant writers and so on). Coming years will see evaluators playing an academic version of Moneyball (the statistical approach to US baseball): instead of trying to field teams of identical superstars, we will leverage nuanced impact data to build teams of specialists who add up to more than the sum of their parts.
Science is a big team sport especially with today’s need for interdisciplinary and large-scale experiments. We need to encourage the building of teams in sciences.
Wow! The last three days have been crazy, hectic, awesome and inspiring. We just finished putting on The Future of Research Communication and e-Scholarhip (FORCE11)’s Beyond the PDF 2 conference here in Amsterdam. (I was chair of the organizing committee and in charge of local arrangements) The idea behind Beyond the PDF was to bring together a diverse set of people (scholars, technologists, policy experts, librarians, start-ups, publishers, …) all interested in making scholarly and research communication better. In that case, I think we achieved are goal. We had 210 attendees from across the spectrum. Below are two charts: one of the types organizations of the attendees and domain they are from.
The program of the conference was varied. We covered new tools, business models, the context of the approach, research evaluation, visions for the futures and how to moved forward. Here, I won’t go over the entire conference here. We’ll have a complete video online soon (thanks Elsevier). I just wanted to call out some personal highlights.
We had two great keynotes from Kathleen Fitzpatrick of the Modern Language Association and the other from Carol Tenopir (Chancellor’s Professor at the School of Information Sciences at the University of Tennessee, Knoxville). Kathleen discussed how it is essential for humanities to embrace new forms of scholarly communication as it allows for faster dissemination of their work. Carol discussed the practice of reading for academics. She’s done in-depth tracking of how scientists read. Some interesting tidbits: successful scientists read more and so far social media use has not decreased the amount of reading that scientists do. The keynotes were really a sign of how much more humanities were present at this conference than Beyond the PDF 1.
The tools are there
Just two years ago at the first Beyond the PDF, there were mainly initial ideas and drafts for next generation research communication tools. At this year’s conference, there were really a huge number of tools that are ready to be used. Figshare, PDFX, Authorea, Mendeley, IsaTools, StemBook, Commons in a Box, IPython, ImpactStory and on…
Furthermore, there are different ways of publishing from PeerJ to Hypothes.is and even just posting to blog. Probably the interesting idea of the conference was the use of github to essential publish.
For me this made me think it’s time to think about my own scientific workflow and figure out how to update it to better use these tools in practice.
People made connections
At the end of the conference, I asked if people had made a new connection. Almost every hand went up. It was great to see publishers, technologists, librarians also talking together. The twitter back channel at the conference was great. We saw a lot of conversations that kept going on #btpdf2 and also people commenting while watching the live stream. Check out a great Storify of the social media stream of the conference done by Graham Steel.
Beyond the PDF 2 photographs van Maurice Vanderfeesten is in licentie gegeven volgens een Creative Commons Naamsvermelding-GelijkDelen 3.0 Unported licentie.
Gebaseerd op een werk op http://sdrv.ms/YI4Z4k.
Making it happen
We gave a challenge to the community, “what would you do with 1k today that would change scholarly communication for the better? ” The challenge was well received and we had a bunch of different ideas from sponsoring viewing parties to encouraging the adoption of DOIs in the developing world and by small publishers.
The Challenge of Evaluation
We had a great discussion around the role of evaluation. I think the format that was used by Carole Goble for the evaluation session where we had role playing representing key players in the evaluation of research and researchers really highlighted the fact that we have a first mover problem. None of the roles feel that “they should go first”. It was unclear how to push past that challenge.
Personally, I had a great time. FORCE 11 is a unique community and I think brings together people that need to talk to change the way we communicate scholarship. This was my quick thoughts on the event. There’s a lot more to come. We will have the video of the event up soon. Also, we will have drawn notes posted provided by Jongens van de Tekeningen. Also, we will award a series of 1k grants to support ongoing work. Finally, I hope to see many more blog posts documenting the different views of attendees.
We had many great sponsors that helped make a great event. Things like live streaming, student scholarships, a professional set-up, demos & dinner ensure that an event like this works.
- Alfred P. Sloan Foundation through FORCE11: Main conference support
- Elsevier: Video streaming and student travel fellowships
- Gordon and Betty Moore Foundation: Challenge prizes, meeting support, travel fellowships
- PLoS - organization suport
- The Network Institute: local organization support
- Data2Semantics - local organization
One of the ideas in the altmetrics manifesto was that almetrics allow a diversity of metrics. With colleagues in the VU University Amsterdam’s Network Institute, we’ve been investigating the use of online data (in this case google scholar) to help create new metrics to measure the independence of researchers. In this case, we need fresh data to establish whether an emerging scholar is becoming independent from their supervisor. We just had the results of one our approaches accepted into the Web Science 2013 conference. The abstract is below and here’s a link to the preprint.
Anca Dumitrache, Paul Groth, and Peter van den Besselaar
Metrics play a key part in the assessment of scholars. These metrics are primarily computed using data collected in offline procedures. In this work, we compare the usage of a publication database based on a Web crawl and a traditional publication database for computing scholarly metrics. We focus on metrics that determine the independence of researchers from their supervisor, which are used to assess the growth of young researchers. We describe two types of graphs that can be constructed from online data: the co-author network of the young researcher, and the combined topic network of the young researcher and their supervisor, together with a series of network properties that describe these graphs. Finally, we show that, for the purpose of discovering emerging talent, dynamic online resources for publications provide better coverage than more traditional datasets.
This is fairly preliminary work, it mainly establishes that we want to use the freshest possible data for this work. We are expanding the work to do a large scale study of independence as well as to use different sources of data. But to me, this shows how the freshness of web data allows us to begin looking at and measuring research in new ways.
I’ve been reviewing papers lately and I’m beginning to develop a new heuristic: If I follow a link mentioned in the paper and there’s something there that’s reasonable, there’s a good chance the paper is good. Not all the time, of course, but it’s surprisingly good predictor. In particular, I review computer science papers many of which describe frameworks, architectures or systems. The potential reusability of these artifacts is partly premised on the availability of their code. Unfortunately, in some cases there’s nothing on the other end of the link or the link doesn’t make sense.
The moral of the story – include links in your papers and make sure they work.
Below is a post-it note summary made with our students in the Web Science course. This is the capstone class for students doing the Web Science minor here a the VU and the summary highlights the topics they’ve learned about so far in four other courses.
The WordPress.com stats helper monkeys prepared a 2012 annual report for this blog.
Here’s an excerpt:
600 people reached the top of Mt. Everest in 2012. This blog got about 4,900 views in 2012. If every person who reached the top of Mt. Everest viewed this blog, it would have taken 8 years to get that many views.