Laying out penn treebank output of Stanford parser

I am trying to use Stanford NLP parser for my research and I need to look at the trees it produces for large, complex sentences. I have found several packages for laying out the output as trees, but they are all seem to be targeted at visualizing smaller sentences, suitable for illustrating a point in the published paper. Sample output of Graphviz layout for Stanford Parser’s output

My trees are large. A sentence of 40 words is an average case, rather than an edge one. So, all of the display packages I have tried cut off large chunks of the tree. It might be possible to tinker with their LaTeX code to produce output that is not cut-off at letter, a4 or even a3 size, but I am not that good with LaTeX yet. And I need to produce this large trees quickly, as I am not even sure whether this parser would be suitable for my needs in the long run.

So, instead, I wrote my own bridging code in Java between penn treebank output of the parser and Graphviz, graph layout software that I use for many layout tasks. The whole implementation was in one file less than 100 lines total and that included the logic to highlight maximum spanning subtrees of a particular element (NounPhrase in this example). Click on the small image to see the full example. Graphviz input file is also available for the curious.

At the moment, it is sufficient to convert to image files. If I ever do convince the parser to understand my 80-word sentences, the resulting trees will probably be large enough to need ZGRViewer.

The Java bridging code is not available yet, as it is very ugly. The secret was in the PennTreeReader‘s main() method that showed how to read the parser’s output back in and into Tree form suitable for recursive descent. After that, it was just the code to navigate the tree levels and spit out incredibly easy Graphviz format. I will probably clean the code up a bit over the next couple of weeks and then release it.

If somebody does like the output and wants to see the code sooner, send me an email at alex@thisdomain.

24 thoughts on “Laying out penn treebank output of Stanford parser”

  1. Hi Alex,
    I was a new hand in the field of NLP, recently I saw your article in your blog and was really interesting in it,
    (http://blog.outerthoughts.com/2007/06/laying-out-penn-treebank-output-of-stanford-parser/)
    could your send me the code about this?

    BTW , do you have any sample code of how to use openNLP? I read the mini tutorial you mentioned in your blog:
    http://danielmclaren.net/2007/05/11/getting-started-with-opennlp-natural-language-processing
    but still face some problems to run my program.
    (I don’t know what openNLP to import in Java, I can’t find any openNLP-tool.jar to download)

    Many thanks!

    Sophia

  2. Hi Alex,

    I’m new to NLP and have been looking at the Stanford parser lately. Can you email the code to me too? It will be very helpful, I’m trying to include it in my thesis for my Bachelor’s.

    Thanks
    Blerta

  3. A basic example is now available for download.

    It relies on having Stanford library installed and, to generate images, Graphviz needs to be present as well. There is a tests directory with one example from source sentence all the way to the gif file.

  4. Dear All,
    I am a B.Sc student who needs to generate a semantic graph for input given as a paragraph. Does anyone of u know a good place to find any tutorials or material related to Stanford Parser ???????/

    Thx in advance…….

  5. Diluka,

    I don’t know of any special place beyond whatever google normally turns up. I believe they also have a mailing list you could try.

  6. Hi, Alex
    I’m new to the field of NLP and almost know nothing about its tools. I need to read in the parsed file from penn treebank for some kinds of preprocessing.
    I find the documentation of “PennTreeReader” you mentioned but can’t find the download place.
    Could you send me the code package?
    Thx in advance.

  7. Hi Wendy,

    The first link in the article goes to the parser’s overview page and the download section. PennTreeReader is part of that package.

  8. Hi Alex,
    I am trying to use the stanford Parser for my research. I particularly need to manipulate the typed dependencies and would like the output be returned to a text file. I have downloaded the grammarbrowser and stanford lexicalized parser-1.4 as well as the stanford parser 1.6. I have no idea where to start to invoke the program i.e the main file. I’ve tried to run some of the sample command line given in the site but it fails to work. Could you please advice by email?

    Thank you,
    wani

  9. Hi Alex,

    I am trying to use the Stanford Parser for my research.I juz wanted the Parts of speech tagging,in which each word needs to get under the category of NNP,NN,NNPS,etc. After downloading the parser,when i run the jar file from it,it asks me to load file and parser.
    Please tell me what i should do coz the file loaded by me seems to be changed with some garbage value,nd how can i load a parser.

    Thank you,
    Pallavi

  10. Hi Jack,

    I also need a very similar adapter for my research and came across your page while searching for it. It would be great if I can get my hands on your code (nevermind the shape, I can play with it to suit my requirements).

    Regards,
    Sushain

  11. Hi Alex,
    I am a rookie in NLP. Fistly, I am so glad you can spare a few minute to read my letter. Recently, I find out that the OpenNLP is very useful. However, the key issues is that I cannot find the sample code of this tool. Honestly, the API which is offered on the website is not sufficient enough and such problem really gett me down. So, I’ve wonder if you can send some useful sample code, for the best, to me, especially the Treebank Parser. I really appreciate a lot for your kindly consideration. Thank you again for your offer.

  12. Hi Alex,
    I am new with nlp packages and have been using wordStemmer class of stanford parser for stemming various words in a large file and using that output further, however I am unable to find out the format of file (englishPCFG.ser.gz) that is used by it.
    I would really appreciate if you could help me out with it.
    -Smriti

  13. hello Alex. I really need a sample code on how to use the opennlp project specially the parser for my thesis as a bachelor.If you don’t mind please send me the code to my email. I really do hope you can help me with it..:)

    thanks a lot in advance.

  14. Hello, we have proposed a thesis topic regarding question generation. Our path is to use the penn tree bank to tag words and then use the stanford parser. However, we can’t find any program or code for penn tree bank. please help. We are new in teh field of NLP.

  15. I would like to use the example on this page in a talk – to illustrate the recursive nature of language and hence aspects of the nature of mathematics. I hope that is fine.

  16. Hi Alexandre Rafalovitch,
    I have a task of parsing a business agreement, which is in plain text to penn treebank foramt with phrase and clause labels. I really confused with the examples and descriptions found in internet. It would be helpful, if i find code snippet or example in this regard. Expecting reply with helpful guidance

Comments are closed.