While toying with the public BigQuery datasets, impatiently waiting for Google Cloud Dataflow to be released, I’ve noticed the Wikipedia Revision History one, which contains a list of 314M Wikipedia edits, up to 2010. In the spirit of Amazon’s “people who bought this”, I’ve decided to run a small experiment about music recommendations based on Wikipedia edits. The results are not perfect, but provide some insights that could be used to bootstrap a recommendation platform.
Wikipedia edits as a data source
Wikipedia pages are often an invaluable source of knowledge. Yet, the type and frequency of their edits also provide great data to mine knowledge from. See for instance the Wikipedia Live Monitor by Thomas Steiner, detecting breaking news through Wikipedia, “You are what you edit“, an ICWSM09 study of Wikipedia edits to identify contributors’ location, or some of my joint work on data provenance with Fabrizio Orlandi.
Here, my assumption to build a…
View original post 689 more words