Getting OpenNLP parser to work

I was not able to get OpenNLP parser to work. There were no samples to play with, no command line tools to run. And I don’t even want to talk about documentation. That’s because there was not any. There was an attempt at lame joke (at least that’s the only sense I can make of what.html file), but no actual documentation.

Finally, I pinged my research colleague who did get the toolkit working (thanks Scott). Turns out to be there is a whole set of model files missing from the tool’s download. They are linked to from a separate page on the original website (not even in the download).

I am downloading the models now and hopefully will be on my way. But I can certainly see why this particular toolkit is mentioned much less frequently than Stanford’s or Bikel’s.

After the fact, I have also found a mini tutorial by Daniel McLaren explaining OpenNLP components and showing some sample code and output. Looks better than what’s bundled with OpenNLP itself. Maybe Daniel and Thomas Morton (author of OpenNLP) should talk.

11 thoughts on “Getting OpenNLP parser to work”

  1. You can use 2 lines of code to get you’re sentences parsed 🙂 [figured it out partly thanks to the tutorial you linked in]

    String sentence = "This is some random sentence to get you started";

    // Load the serialized parser
    ParserME parser = TreebankParser.getParser("wherever-your-parser-dir-is/");

    // Parse the sentence, and fetch an array containing the best 1 parses, access the first (0) element and display it
    TreebankParser.parseLine(sentence, parser, 1)[0].show();

  2. dude… daniel maclaren’s tutorial is based on code within the OpenNLP package itself. In fact almost all the code’s taken from the english.* java files. It is open source after all, all you have to do is read through the source. chill man chill.

  3. Quek,

    I don’t buy ‘read the source’ argument in the context of this article. Code, and especially NLP code, is not easy to read and therefore has a higher learning curve than tutorials and good documentations. Therefore, people with not enough time (or patience) will drop out early and go look at something else. That’s why one of the measure of success for open source projects is whether they have a good introductory book published.

    Obviously, not everything can be documented, but it should be easy to at least start quickly and judge whether it is worth spending extra time on looking through the code.

    That’s what happened with Stanford parser for me. There was just enough documentation to start and then I read the source to figure out the additional options.

  4. I wrote a small program which takes a simple XML file, cleans it and uses the detector/tokenizer/tagger/chunker (in that order). The download link and the install instructions (it’s an Ant build.xml) can be found here.

Comments are closed.