Dr. René Witte has just created a new mailing list (SENLP) to discuss applying NLP techniques to Software Engineering and also to discuss general Software Engineering issues in developing NLP systems.
I am interested in both topics. I did 3 years as senior technical support at BEA and could see how applying NLP techniques on written notes in support cases could have improved quality of technical support. I did not get to do any of that, but some interest remains.
The second topic is even more interesting and important to me. It can build on current discussions currently held on blogs (see ‘The USES Issue‘ at Niels Ott’s blog) and in journals (see: ‘Empiricism Is Not a Matter of Faith‘ by Ted Pedersen). While some of the issues are discussed on mailing lists for individual pieces of software, a place to discuss cross-cutting concerns is very welcome.
I have joined the list and hope to see at least some of my readers there as well.
I am frustrated. I know my corpus (resolutions of the United Nations General Assembly) shares a lot in common with biomedical and legal domain. And I can find interesting articles in biomedical domain dealing with similar issues of complex tokenization, long named entity mentions (though mine are much longer), etc. But I see nothing in legal domain.
I have just gone through all of Jurix‘ proceedings as well as all of Artificial Intelligence and Law and all I got is between 2 and 4 articles worth following-up.
There must be somebody actually trying to parse real legal texts and figuring out to deal with complex organisation, people and group names. But all I can see is articles dealing with levels from ontology and up.
There might even be money in it!
And the business model would center on providing automatic notification option if a notice from subscribed website sneakily changed and became much worse. That way one would pay money for peace of mind that there were no unexpected service rule changes.
It is hard enough to explain what we are doing to our professors; explaining it in plain English to our friends and family is nearly impossible.
So it is always good to see people who can explain what POS tagger is and why it is important without having to throw around references to Norvig or Jurafsky.
Markus Dickinson has managed to do exactly such explanation in his non-linguistic primer to a serious research paper on Detecting Errors in Part-of-Speech Annotation. The writing is quite old (2003), but it reads well and still feels relevant. Of course, his research page contains more recent papers on the same topic too.