Music recommendations with 300M data points and one SQL query

Alexandre Passant

While toying with the public BigQuery datasets, impatiently waiting for Google Cloud Dataflow to be released, I’ve noticed the Wikipedia Revision History one, which contains a list of 314M Wikipedia edits, up to 2010. In the spirit of Amazon’s “people who bought this”, I’ve decided to run a small experiment about music recommendations based on Wikipedia edits. The results are not perfect, but provide some insights that could be used to bootstrap a  recommendation platform.

Wikipedia edits as a data source

Wikipedia pages are often an invaluable source of knowledge. Yet, the type and frequency of their edits also provide great data to mine knowledge from. See for instance the Wikipedia Live Monitor by Thomas Steiner, detecting breaking news through Wikipedia,  “You are what you edit“, an ICWSM09 study of Wikipedia edits to identify contributors’ location, or some of my joint work on data provenance with Fabrizio Orlandi.

Here, my assumption to build a…

View original post 689 more words

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s