This has been a great week if you think that it’s important to know the origins of content on the web. First, Google announced the support of explicit metadata describing the origins of news article content that will be used by Google News. Publishers can now identify using two tags whether they the original source of a piece of news or are syndicating it from some other provider. Second, the New York Times now has the ability to do paragraph level permalinks. (So this is the link to the third paragraph of an article on starbucks recycling). So one can link to the exact paragraph when quoting a piece. This was supported by some other sites as well and there’s a wordpress plug-in for it but having the Times support it is big news. Essentially, with a couple of tweaks these techniques could make the quote pattern that you see in blogs (shown below) machine readable.
In the W3C Provenance Incubator Group that is just wrapping up, one of the main scenarios was how to support a News Aggregator that can makes use of provenance to help determine the quality of the articles it automatically creates. With these developments, we are moving one step closer to being able to make this scenario possible.
To me, this is more evidence that with simple markup, and simple link structures, we can achieve the goal of having machines know where content on the web originates. However, like with a lot of the web, we need to agree on those simple structures so that everyone knows how to properly give credit on the web.