Computational Linguistics – News update for Oct 9, 2006

Couple of interesting things happened recently in the Computational Linguistics related fields that I thought were worth linking to:

  • ACM Queue had an interview with Mike Cohen of Google (previously of Nuance Communications) discussing recent advances and changes in speech recognition technology.
  • Pluggd, with its hotly discussed demo of HearHere, uses speech recognition and some sort of topic clustering to show a time heatmap of your search keyword inside the podcast. The idea is that the heatmap allows you to skip straight to the discussion of the topic you are interested in and ignore parts unrelevant to your interest (and adverts). They have a short presentation about the product in DEMOfall archives. Warning: sometimes it takes a couple of tries to get DEMO video to play (depending on system load).
  • PodZinger that already used speech recognition to search within podcasts for search terms, just added an advertising platform that is based on classifying by the content and the search term.
  • Netflix has created a challenge where they provide recommendation data, so that other people can try developing an algorithm better than Netflix’s own data mining team. With the big prise of a million dollars (1,000,000$), there is a lot of competitors already. While the dataset provided only has movie titles and therefore not enough to do any text/description analysis, it is still a huge dataset to try various graph and neural network methods on. Most of the people suggest mashing it up with IMDB or some other movie information database, but that obviously requires additional data matching work.
  • ClearForest on the other hand is only offering 2000 dollars (2000$) in their competition and you have to bring your own data, but at least they provide an API that does named entities recognition. Beats having to load up GATE every time and, who knows, maybe somebody can create another Gutenkarte-style mashup.
  • And to finish on a funny note, maybe you would like the one generated by the STANDUP (popularised writeup): What do you get when you cross a car with a sandwich? A traffic jam.