Screencast about IBM’s UIMA text processing architecture

UIMA is a new-ish framework on the block competing/cooperating with GATE framework to do NLP processing, annotation and search. Jon Udell recorded a screencast with a couple of IBM-ers to show off and explain UIMA.

While the screencast moves a little slow for a person familiar with sentence tokenizing principles, it is still interesting to see how it hangs together.

The only problem I see with UIMA is the confusion in licensing. One version of UIMA is under alphaWorks (you’ll pay us later) license; another under uncommon Common Public License; yet another one is under IBM commercial license. This may or may not matter for a researcher, but is still something that needs to be carefully considered.

Still, running under Eclipse UIMA (which btw. stands for Unstructured Information Management Architecture) has a very nice interface. Nicer than GATE, which I find quite clunky. And it is theoretically possibly to plug GATE and OpenNLP components into UIMA with no or little wrapper coding.

Tags: Computational Linguistics, UIMA