I am trying to use Stanford NLP parser for my research and I need to look at the trees it produces for large, complex sentences. I have found several packages for laying out the output as trees, but they are all seem to be targeted at visualizing smaller sentences, suitable for illustrating a point in the published paper. ![]()
My trees are large. A sentence of 40 words is an average case, rather than an edge one. So, all of the display packages I have tried cut off large chunks of the tree. It might be possible to tinker with their LaTeX code to produce output that is not cut-off at letter, a4 or even a3 size, but I am not that good with LaTeX yet. And I need to produce this large trees quickly, as I am not even sure whether this parser would be suitable for my needs in the long run.
So, instead, I wrote my own bridging code in Java between penn treebank output of the parser and Graphviz, graph layout software that I use for many layout tasks. The whole implementation was in one file less than 100 lines total and that included the logic to highlight maximum spanning subtrees of a particular element (NounPhrase in this example). Click on the small image to see the full example. Graphviz input file is also available for the curious.
At the moment, it is sufficient to convert to image files. If I ever do convince the parser to understand my 80-word sentences, the resulting trees will probably be large enough to need ZGRViewer.
The Java bridging code is not available yet, as it is very ugly. The secret was in the PennTreeReader‘s main() method that showed how to read the parser’s output back in and into Tree form suitable for recursive descent. After that, it was just the code to navigate the tree levels and spit out incredibly easy Graphviz format. I will probably clean the code up a bit over the next couple of weeks and then release it.
If somebody does like the output and wants to see the code sooner, send me an email at alex@thisdomain.
Hi Alex,
I was a new hand in the field of NLP, recently I saw your article in your blog and was really interesting in it,
(http://blog.outerthoughts.com/2007/06/laying-out-penn-treebank-output-of-stanford-parser/)
could your send me the code about this?
BTW , do you have any sample code of how to use openNLP? I read the mini tutorial you mentioned in your blog:
http://danielmclaren.net/2007/05/11/getting-started-with-opennlp-natural-language-processing
but still face some problems to run my program.
(I don’t know what openNLP to import in Java, I can’t find any openNLP-tool.jar to download)
Many thanks!
Sophia
Responded by email.
Hi Alex,
Can you email that to me as well?
Thanks
Udayan
Hi Alex,
I’m new to NLP and have been looking at the Stanford parser lately. Can you email the code to me too? It will be very helpful, I’m trying to include it in my thesis for my Bachelor’s.
Thanks
Blerta
Hi Alex,
I tried a lot to capture the out put of the parser.
can you send me the code of the same and some tutorials if any.
A basic example is now available for download.
It relies on having Stanford library installed and, to generate images, Graphviz needs to be present as well. There is a tests directory with one example from source sentence all the way to the gif file.
Dear All,
I am a B.Sc student who needs to generate a semantic graph for input given as a paragraph. Does anyone of u know a good place to find any tutorials or material related to Stanford Parser ???????/
Thx in advance…….
Diluka,
I don’t know of any special place beyond whatever google normally turns up. I believe they also have a mailing list you could try.
Hi, Alex
I’m new to the field of NLP and almost know nothing about its tools. I need to read in the parsed file from penn treebank for some kinds of preprocessing.
I find the documentation of “PennTreeReader” you mentioned but can’t find the download place.
Could you send me the code package?
Thx in advance.
Hi Wendy,
The first link in the article goes to the parser’s overview page and the download section. PennTreeReader is part of that package.
Hi Alex,
I am trying to use the stanford Parser for my research. I particularly need to manipulate the typed dependencies and would like the output be returned to a text file. I have downloaded the grammarbrowser and stanford lexicalized parser-1.4 as well as the stanford parser 1.6. I have no idea where to start to invoke the program i.e the main file. I’ve tried to run some of the sample command line given in the site but it fails to work. Could you please advice by email?
Thank you,
wani
Hi wani,
Have you looked at GATE? It has a Stanford Parser module, I believe and maybe easier to start with as it also provides other components.
Hi Alex,
I am trying to use the Stanford Parser for my research.I juz wanted the Parts of speech tagging,in which each word needs to get under the category of NNP,NN,NNPS,etc. After downloading the parser,when i run the jar file from it,it asks me to load file and parser.
Please tell me what i should do coz the file loaded by me seems to be changed with some garbage value,nd how can i load a parser.
Thank you,
Pallavi
Hi Pallavi,
I am not (in) a Stanford Parser support group. They have a mailing list. You may try your luck there.
Hi Alex,
Can you email your code to me as well?
Thanks
Jack
Hi Jack,
I also need a very similar adapter for my research and came across your page while searching for it. It would be great if I can get my hands on your code (nevermind the shape, I can play with it to suit my requirements).
Regards,
Sushain
Hi Alex,
I am a rookie in NLP. Fistly, I am so glad you can spare a few minute to read my letter. Recently, I find out that the OpenNLP is very useful. However, the key issues is that I cannot find the sample code of this tool. Honestly, the API which is offered on the website is not sufficient enough and such problem really gett me down. So, I’ve wonder if you can send some useful sample code, for the best, to me, especially the Treebank Parser. I really appreciate a lot for your kindly consideration. Thank you again for your offer.
Hi Alex,
I am new with nlp packages and have been using wordStemmer class of stanford parser for stemming various words in a large file and using that output further, however I am unable to find out the format of file (englishPCFG.ser.gz) that is used by it.
I would really appreciate if you could help me out with it.
-Smriti