Monthly Archives: May 2010

Last week I had the pleasure of attending both the Web Science and  World Wide Web Conferences, which were co-located together in Raleigh, NC.  (Aside, I still find it odd that going home now is flying to Europe and not away from it.) It was my first time at both events and they both had excellent, thought provoking programs and I hope I get to go again next year. In addition, I got to meet several people in meatspace, that I had heard tons of times on the W3C’s Provenance Incubator Group telecons, at our  face-to-face meeting. As with all good conferences, you walk away with tons of things to think about, new contacts to follow-up on, and new projects to do. Here, I want to focus on what for me were the two themes of the week in Raleigh:  The Rise of Structured Data  and Understanding Your Data

1. The Rise of Structured Data

Maybe it’s because I hung out with Linked Data people or because Facebook launched its Open Graph Protocol the week before but there was tons of talk about the usefulness of structured data. There was obviously focus on open government data, which has taken off this past year with and but the talk was also coming from big commercial players. In two of the panels I attended, people from Facebook, Bing, Google, and Yahoo all talked about how they were taking advantage of structured data to provide better services. Essentially, if people provide a bit of extra structure to the data on their websites, it’s easier for these services to take advantage of the data to provide useful stuff like different ways to browse or find products. It’s not that everything will be structured data but some of it goes a long way to ease the building of applications.

2. Understanding Your Data

There was an absolutely excellent keynote by Danah Boyd who made a number of points about privacy and big data, in particular, big data about people. Her thoughts about the crucial nature of understanding your data reasonated with me both as someone who is trying to work with massive data sets and someone who is trying to work with social scientists to understand science processes. Here’s a clip from the keynote talking about the importance of  understanding the meaning behind the data:

She summarized her points as follows:

1) Bigger Data are Not Always Better Data
2) Not All Data are Created Equal
3) What and Why are Different Questions
4) Be Careful of Your Interpretations
5) Just Because It is Accessible Doesn’t Mean Using It is Ethical

I think these points apply even if the massive data we’re using isn’t about people. We need to think about the provenance of our data, understand how rich it is or isn’t, and think carefully about what kind of questions, we can really answer.

Indeed, it wasn’t just in this keynote that understanding data came up as a crucial factor. When people talked about government data, it was clear that it was necessary to explain how the data was gathered and what could really be ascertained from it. For example, when the US government released the data about the stimulus package, it got harangued because it was a first draft and there were errors in it. In talks about linked data, it was evident that you needed to understand how people were using certain predicates (e.g. sameas) in order to make use of it. In the talk I gave, one question I got was whether the dataset I used was good enough to make conclusions about particular scientists (it’s not).

So understanding your data is crucial but it’s incredibly hard to do. Maybe that’s what computer scientists need to building tools to let people really understand their data.

Finally, in case your interested, I embed the slides from the talk I gave at Web Science. I think it’s a great example of social science meeting computer science (but I might be biased).

%d bloggers like this: