arafalov on March 28th, 2007

UIMA (Unstructured Information Management Architecture) project has recently transitioned from IBM to the Apache incubator. This is only for the open source part, the commercial part is - and will stay - with IBM.
I have last written about UIMA a very long time ago (1, 2) , so I decided to give it another look.
It [...]

Continue reading about UIMA - a “very quick” quick start guide

arafalov on November 16th, 2006

Lots of new sightings of CL/NLP technologies since the last update:

On the commercial speech recognition front, Nexidia is currently in beta with phonemes-mapping audio search. But don’t go to the company’s site. Instead, read the explanation and collection of links is in the ResourceShelf’s article.
If, instead of waiting for commercial offerings, you would like to [...]

Continue reading about Computational Linguistics - News update for Nov 15, 2006

arafalov on November 3rd, 2006

In my last update on applied computational linguistics, I have written about PodZinger that uses speech recognition to figure out which advertisement to match to the podcast you are searching with their service.
Another company is claiming to do that with songs - Lirix. Their upcoming AdLirix platform is supposed to be so effective that Lirix [...]

Continue reading about Lirix - computational linguistics aspects

arafalov on November 1st, 2006

Nine months ago, I had asked “Where are the blogs of computational linguists?” Now, there is an answer.
The Association for Computational Linguistics has moved its documents (formerly ACL Universe) into the Wiki and there is now a separate page for blogs. It has all of the blogs I found so far and more. It even [...]

Continue reading about There! are the blogs of computational linguists

arafalov on October 10th, 2006

Couple of interesting things happened recently in the Computational Linguistics related fields that I thought were worth linking to:

ACM Queue had an interview with Mike Cohen of Google (previously of Nuance Communications) discussing recent advances and changes in speech recognition technology.
Pluggd, with its hotly discussed demo of HearHere, uses speech recognition and some sort of [...]

Continue reading about Computational Linguistics - News update for Oct 9, 2006

arafalov on October 9th, 2006

[This article also appears in a slightly edited form as a TeleRead entry]
Ever tried learning a foreign language? Noticed how the books you could read were often boring, and the books you wanted to read were just that bit too hard to understand? Wished, you could have a quick translation of a complex passage or [...]

Continue reading about How e-books could revolutionize language-learning

FreshNotes (currently in alpha) uses basic named entity extraction and maybe information extraction to produce a website that allows to search and navigate relationships between people and/or topics. The interface, but of course it is all pre-baked at the moment.
From the CL point of view, I can see that there is very little smarts in [...]

Continue reading about FreshNotes: Web 2.0 company using computational linguistics

arafalov on July 9th, 2006

I have written about UIMA, IBM’s Natural Language Processing framework before. Since then, I had a couple of attempts to get a feel for it. Unfortunately, it kept feeling uncomfortable and confusing. Finally, I figured out why.
UIMA’s extensive documentation expects that you are committed to the framework. So, the documentation makes sure you understand full [...]

Continue reading about UIMA’s expectations of the user

Even the basic techniques from the computational linguistics field can make for interesting and intriguing applications. Gutenkarte takes public domain books, extract geographic names present in the text and plots them on the map. The result is an automatic clustering of place references, both visually and (within single click) textually.
The site itself is self-explanatory, but [...]

Continue reading about Creative use of the Named Entity Recognition techniques

UIMA is a new-ish framework on the block competing/cooperating with GATE framework to do NLP processing, annotation and search. Jon Udell recorded a screencast with a couple of IBM-ers to show off and explain UIMA.
While the screencast moves a little slow for a person familiar with sentence tokenizing principles, it is still interesting to see [...]

Continue reading about Screencast about IBM’s UIMA text processing architecture