Where are all legal computational linguistics resources?

I am frustrated. I know my corpus (resolutions of the United Nations General Assembly) shares a lot in common with biomedical and legal domain. And I can find interesting articles in biomedical domain dealing with similar issues of complex tokenization, long named entity mentions (though mine are much longer), etc. But I see nothing in legal domain.

I have just gone through all of Jurix‘ proceedings as well as all of Artificial Intelligence and Law and all I got is between 2 and 4 articles worth following-up.

There must be somebody actually trying to parse real legal texts and figuring out to deal with complex organisation, people and group names. But all I can see is articles dealing with levels from ontology and up.

There might even be money in it!

One of the crazy business ideas I had was to parse all the web-based terms of use and privacy notices and annotate/crowd-vote them for how bad they are. So, before creating a web-based account, I could check it against database/parser and it would highlight and rate for me passages that I really should pay attention to (e.g. we sell your contact details to every spammer we know ). Since the language of those notices is often ritualistically formulaic, extracting interesting and useful summary would actually be simpler than it looks.

And the business model would center on providing automatic notification option if a notice from subscribed website sneakily changed and became much worse. That way one would pay money for peace of mind that there were no unexpected service rule changes.

4 thoughts on “Where are all legal computational linguistics resources?”

  1. You should look into some recent research on parallel corpora. The European Union Government has to translate a lot of legal text into several languages and as far as I know people use these materials as parallel corpora. Of course it should be possible to use them without the parallel option.

  2. DrNI,

    Thanks for the comment. I am aware of Canadian Hansards, EuroParl and JRC-Acquis and will see how they are used. Hopefully, something interesting will turn up that way.

  3. Don’t they use the parrallel corpora to do machine translation? The idea is that the machine learns one set of inputs (documents in English) emit another set of outputs (documents in, say, French).

    Regarding legal corpora – I wonder about the use of machines in the field of litigation. I suspect the reason that it is not used is because lawyers (unlike, say, bankers or even software guys) are cheap and plentiful. Why bother making cheap labour more efficient?

    And indeed, why employ specialised, expensive labour (software engineers with a knowledge of natural language processing) to a problem that is solved cheaply every day?

  4. I don’t know about machine translation, though it does sound familiar.

    Regarding what they do research, there is a couple of high level argumentation and reasoning systems. They use those to teach argumentation and to visualize cases. But that’s too far up the stack for me.

Comments are closed.