More details emerge on Powerset’s engine

Dan Farber has written a good article on Powerset.

It mostly talks about their grandiose marketing plans and how NLP (Natural Language Processing) will change the world, however it also has a reasonable explanation of what they are doing with fairly transparent references to (expanded) WordNet, named entity recognition, event extraction and semantic web technologies.

It is also interesting that the article tries to give impression that Google is not using any of these techniques, while the quotes are hinting at more similarities than differences. It does seem to be true that Google uses their statistical/corpus NLP methods even where something like WordNet might have been useful. But then, they do have a huge, up-to-date, corpus to work with.

One interesting item in the article for me is that Powerset may allow developers to build on top of its platform. That might become a game changer for researchers in NLP field, if there were able to incorporate Powerset’s results into their own algorithms.

For example, if I were trying to choose between two options for a complex named entity and if I knew that Powerset had one listed in their index and rejected another, that would be a good confirmation mechanism. This is similar to how researchers already use Wikipedia’s entries for disambiguation and context building.