UIMA – a “very quick” quick start guide

UIMA (Unstructured Information Management Architecture) project has recently transitioned from IBM to the Apache incubator. This is only for the open source part, the commercial part is - and will stay - with IBM.

I have last written about UIMA a very long time ago (1, 2) , so I decided to give it another look.

It is still as complicated as ever and it still takes a couple of hours of browsing documentation before one can run a basic example. So, I have decided to just document the fastest way to get something to show up.

  1. Download and unpack the binary package. Disable your virus checker while unpacking, as it can take a very long time otherwise. It is about 40MBytes uncompressed.

  2. Set UIMA_HOME environment variable to the home directory of the package (e.g. C:\Software\apache-uima)

  3. Run bin\adjustExamplePaths script to adjust all the paths to point to the right places

  4. Run bin\documentAnalyzer script. The other alternative is to install Eclipse, EMF, UIMA plugins, etc. This - later - route is what the documentation strongly pushes towards, but it is not compulsory to just try UIMA out.

  5. In the application popup:

  6. Replace Input Directory’s relative path (examples/data) with the absolute path to that directory

  7. Replace Output Directory’s relative path with the absolute path to a new directory where you want your annotations

  8. Load the Location of Analysis Engine XML Descriptor with an XML file from examples/descriptors/totorial/…. . If a directory has a file ending in TAE.xml, most of the time that’s the one you want to run. Otherwise, try any of them; UIMA will complain if you give it the wrong file.

  9. Press Run and wait for UIMA to process all 7 files in the examples/data directory.

  10. In the next popup, double-click on the file you want to see annotated. This is not very intuitive, as there are also buttons and radio buttons present.

  11. Not every document will have meaningful results. I think the tutorial explains what to look at for which Analyser, but (if you don’t have much time) just look at all the files until you see one with multiple items in the legend field (I am assuming Java viewer here).

Once all this is running and you like the output, it might be worth installing the Eclipse engine and plugins. That will allow to modify java code of the examples and to run it easily with all the paths/classpaths preset.