Tag Archives: semantic web

The Journal of Web Semantics recently published a special issue on Using Provenance in the Semantic Web edited by myself and Yolanda Gil. (Vol 9, No 2 (2011)). All articles are available on the journal’s preprint server.

The issue highlights top research at the intersection of provenance and the Semantic Web. The papers addressed a range of topics including:

  • tracking provenance of DBpedia back to the underlying Wikipedia edits [Orlandi & Passant];
  • how to enable reproducibility using Semantic techniques [Moreau];
  • how to use provenance to effectively reason over large amounts (1 billion triples) of messy data [Bonatti et al.]; and
  • how to begin to capture semantically the intent of scientists [Pignotti et al.].
 Our editorial highlights a common thread between the papers and sums them up as follows:

A common thread through these papers is the use of already existing provenance ontologies. As the community comes to an increasing agreement on the commonalities of provenance representations through efforts such as the W3C Provenance Working Group, this will further enable new research on the use of provenance. This continues the fruitful interaction between standardization and research that is one of the hallmarks of the Semantic Web.

Overall, this set of papers demonstrates the latest approaches to enabling a Web that provides rich descriptions of how, when, where and why Web resources are produced and shows the sorts of reasoning and applications that these provenance descriptions make possible

Finally, it’s important to note that this issue wouldn’t have been possible without the quick and competent reviews done by the anonymous reviewers. This is my public thank you to them.

I hope you take a chance to take a look at this interesting work.

Yesterday, I had the pleasure of giving a tutorial at the NBIC PhD Course on Managing Life Science Information. This week long course focused on applying semantic web technologies to getting to grips with integrating heterogenous life science information.

The tutorial I gave focused on exposing relational databases to the web using the awesome D2R Server. It’s really a great piece of software that shows results right away. Perfect for a tutorial. I also covered how to get going with LarKC and where that software fit in the whole semantic web data management space.

On to the story…

The students easily exposed our test database (GPCR receptors) as RDF using D2R. Now the cool part: I found out just before the start of my tutorial  that the day before they had setup an RDF repository (Sesame) with some personal profile information. So on the fly I had them take the RDF produced by the database conversion and load that into the repository from the day before . This took a couple of clicks. They were then able to query over both their personal information and this new GPCR dataset. With not much work we hand munged together two really different data sets.

This is old hat to any Semantic Web person, but it was a great reminder of how the flexibility of RDF makes it easy to mashup data. No messing with about with tables or figuring out if the schema is right, just load it up into your triple store and start playing.

Sunbelt is the annual meeting for  Social Network Analysis researchers. It’s been going on since 1981 (a couple of years before analyzing twitter graphs became hip) and this year it’s being held in Tampa. Two of my colleagues-Julie Birkholz and Shenghui Wang- are attending and presenting some joint work. The abstracts are below. If you’re at Sunbelt be sure to check out their presentations and have a chat.

At a higher level, I think both pieces of work emphasize the importance of using the combination of rich representations of the data underlying networks along with dynamic network analysis. Networks provide a powerful abstraction mechanism but it’s important to be able to situate that abstraction in a rich context. The techniques we are both developing and applying are steps along the way towards enabling these more “situated” network.

Dynamics Of Scientific Collaboration Networks

Groenewegen, Peter; Birkholz, Julie M.; van der Bunt, Gerhard; Groth, Paul

Evolution of scientific research can be considered as a dynamic network of collaborative relations between researchers. Collaboration in science leads to social networks in which authors can gain prominence through research (knowledge production), access to highly regarded field members, or network positions in the collaborative network. While a central position in network terms can be considered a measure of prominence, the same holds for citation scores. Causal evidence on a central position in the network corresponding to prominence in other dimensions such as the number of citations remains open. In this paper collaborative patterns, research interests and citation counts of co‐authoring scientists will be analyzed using SIENA to establish whether network processes, community or interest strategies lead to status in a scientific fields, or vice versa does status lead to collaboration. Results from an analysis of a subfield of computer science will be presented.

Multilevel Longitudinal Analysis For Studying Influence Between Co‐evolving Social And Content Networks

Wang, Shenghui; Groth, Paul; Kleinnijenhuis, Jan; Oegema, Dirk A

The Social Semantic Web has begun to provide connections between users within social networks and the content they produce across the whole of the Social Web. Thus, the Social Semantic Web provides a basis to analyze both the communication behavior of users together with the content of their communication. However, there is little research combining the tools to study communication behaviour and communication content, namely, social network analysis and content analysis. Furthermore, there is even less work addressing the longitudinal characteristics of such a combination. This paper proposes to take into account both the social networks and the communication content networks. We present a general framework for measuring the dynamic bi‐directional influence between co‐evolving social and content networks. We focus on the twofold research question: how previous communication content and previous network structure affect (1) the current communication content and (2) the current network structure. Multilevel time‐series regression models are used to model the influence between variables derived from social networks and content networks. The effects are studied at the group level as well as the level of individual actors. We apply this framework in two use‐cases: online forum discussions and conference publications. By analysing the dynamics involving both social networks and content networks, we obtain a new perspective towards the connection of social behaviour in the social web and the traditional content analysis.




We had our second Dutch Semantic Web Meetup for 2010 yesterday. This was a smaller and more impromptu event than our last meetup. Many of our colleagues were enjoying the sun in Crete at ESWC. But Jan Aasman from Franz, Inc was in town and we thought it was a good chance to get everyone together to talk semantic web. We had 20 attendees with both new and old attendees.

Jan gave an interesting keynote discussing the internals of AllegroGraph (note, ssd drives really improve performance) as well as its features. I think what resonated most with the audience was the various demos and use cases Jan gave. He gave examples ranging from pharma to integrating information about the environmental impact of the lumber trade in Canada. One of the demos that  Marco Roos (a biologist from Leiden Medical Center) and a number of others thought was compelling,  showed the integration of LinkedClinicalTrials data with a number of other Linking Open Data sets (Diseasome, DrugBank). Jan showed how one could navigate between clinical trials via common diseases, symptoms, etc.

The two take aways were that semantics really allows for integrated data analytics and that were getting close to triple store parity to classic relational databases. Triple stores that can handle a trillion triples are coming this year…

After the talk, we headed to the VU Unviersity’s campus cafe and sat outside in the sun (yes, Amsterdam was emulating Crete). From talking to the various attendees, I think some important connections were made. With this smaller size event, it’s easier to get in-depth into conversations.

Given the short notice for this event, I was really impressed with the turnout. The Dutch semantic web community is clearly strong and we hope to continue organizing these events on a regular basis.

Finally, thanks to Christophe Guéret for organizing the event.

%d bloggers like this: