Skip to content


Semantic web updates

Published by Martin Kleppmann on 02 Oct 2008.

A few weeks ago I noted down some links to current developments of the semantic web. After hearing Tom Morris speak again on “The State of the Semantic Web” at BarCampLondon5, here are some more:

(OMG mad W3C acronyms!)

I also heard Sian Clark of Yahoo speak about SearchMonkey at BCS Search Solutions 2008. This is a very interesting development, allowing site owners to annotate their pages with structured information (using RDFa or Microformats), allowing them to be presented more meaningfully in the search results. A great idea I think!

This move by Yahoo starts giving a first convincing answer to the chicken-and-egg problem of the semantic web: “why would anybody bother to annotate their data in a machine-readable way?” There has got to be some reward attached to it, and doing search engine optimisation (SEO) for Yahoo is a very good reason for creating some semantic metadata! (It’s unlikely to really fly until Google also adopts the idea, but surely that’s just a matter of time.)

What I wonder about: what attempts will there be to parse structured data out of unstructured data sources? There are a few companies doing more or less this, for example Globrix extracts structured information about properties (rent or buy, location, price, number of bedrooms and bathrooms, etc.) from plain text descriptions on estate agents’ websites, and Mydeco extracts structured information about furniture (type of item, colour, width/depth/height, weight, retailer’s location, etc.) from similar unstructured text. There is no technical reason why they couldn’t release that information in a machine-readable RDF format, although there may well be commercial reasons for them wanting to keep it to themselves.