It is a great moment. After many months of work, my book is finally published and is available from multiple sources. It is called Instant Apache Solr for Indexing Data How-to and it has been published by the Packt Publishing.
There is a number of books published on Solr, but I feel that mine is different. . . . → Read More: My book on Solr is now published
Atlassian has just released Crowd 1.3 that now has the Delegated Authentication option – two-faced directory with an external LDAP facing part for authentication and an internal Crowd part for authorisation. This double-faced functionality causes some non-obvious interface issues.
The most important issue to understand is that external part is accessed only when user is . . . → Read More: 5 unobvious things about Atlassian Crowd’s Delegated Authentication Directory
I just found my own oldest webpage (handcoded) and my oldest public source code (Java) at once. Archive.org – that has hosted this long-dead memory since 1999 – is just so great.
Looking back at it, I realise that I was right in the thick of Internet development:
When I just started working with Java, . . . → Read More: Memories, memories
I want to get my parents a digital picture frame. But at the moment I cannot. That’s because I don’t want my somewhat less-technical parents to have to fiddle with memory cards, choosing and transferring photographs or running Vista.
My ideal digital picture frame for them would be one sitting in a living room or . . . → Read More: Chumby: Digital picture frame for parents and much more
From time to time I experiment with GATE NLP toolkit. Just now I tried to upgrade to the latest version (version 4) and run into really strange problem with ANNIE system not loading correctly. Later, when I uninstalled older GATE version, it stopped loading at all.
The problem is the user configuration file gate.xml that . . . → Read More: Upgrading to GATE 4? Beware of leftover configuration files.
I am currently at The Rich Web Experience 2007 conference. It is interesting to compare it to JavaOne conferences I have been to in the past.
To start, RWE is much smaller. It is about 400 people as compared to 15 thousands at JavaOne. This obviously makes scheduling logistics and eating arrangements simpler, but there . . . → Read More: The Rich Web Experience – day 1
When OpenNLP toolkit uses MaxEnt parser, it has to read in about 25 MBytes of model files. The model reader uses basic unbuffered FileReader. The result is the excessive number of system calls (and disk access calls) during the parser startup.
The fix is extremely simple:
In maxent-2.4.0/src/java/opennlp/maxent/io/ObjectGISModelReader.java, replace new FileInputStream(f) with new BufferedInputStream(new FileInputStream(f), . . . → Read More: Reducing disk thrashing of OpenNLP/MaxEnt parser – with one line code change
I was not able to get OpenNLP parser to work. There were no samples to play with, no command line tools to run. And I don’t even want to talk about documentation. That’s because there was not any. There was an attempt at lame joke (at least that’s the only sense I can make of . . . → Read More: Getting OpenNLP parser to work
Bikel’s statistical parser is designed to be run from the command line. I need to run it from my own code.
The following wrapper seems to do the trick on windows (with your own values for|parserdir| : String settingsFile = “|parserdir|\\settings\\collins.properties”; Settings.load(settingsFile); Parser parser = new Parser(“|parserdir|\\bikel\\wsj-02-21.obj.gz”); Sexp result = parser.parse(Sexp.read(“(This is a funny world)”).list()); . . . → Read More: Running Bikel’s parser programmatically
I have been using Stanford NLP Parser from command line with -tagSeparator flag to supply it with partially tagged input. As the parser seems to be really bad with date expressions and complex name entities, I need this functionality.
Now, I need to wrap-up the parser in my own code to add input/output batching and . . . → Read More: Duplicating -tagSeparator effect when using Stanford Parser programmatically