By Alexandre Rafalovitch, on January 11th, 2013 I knew I was neglecting my blog in 2012, but I did not realize just how much until I received WordPress’ year in review for 2012 (Feel free to take a peek at it). The line that stopped me dead was “In 2012, there was 1 new post”. Sure enough – one post it was.
Well, this . . . → Read More: Oops: there goes the blog in 2012
By Alexandre Rafalovitch, on January 26th, 2011 I was asked to guest blog for TAUS about my research/work project UNCORPORA. The article has now gone live. It might be interesting for people interested in UN languages, natural language processing or (by following links) XML geeks.
. . . → Read More: My guest post about uncorpora project at TAUS blog
By Alexandre Rafalovitch, on May 28th, 2009 I like ANTLR! It is a specialized tool that can really be applied to many difficult tasks when regular expressions get all Dust Puppy like. And I have used it in the past with great success.
But, every time I put this particular tool aside, I know that picking it back up will be like . . . → Read More: Making up with ANTLR
By Alexandre Rafalovitch, on March 26th, 2009 A recent article on lingpipe discussed conjuncted named entities such as Johnson and Johnson and Wallace and Gromit. They suggest that maybe a way of treating this is as a frozen expression. I assume that means relying on statistical measures to see this Multi-Word-Expression repeating enough times to be treated as a unit.
In the . . . → Read More: Conjunctions in named entities
By Alexandre Rafalovitch, on January 25th, 2009 I am collecting my reading and reference material in CiteULike. I like the service because it can capture details from multiple sources. It also allows to discover what was collected by other interesting people through tags, people and bookmarks graph navigation.
Nice as CiteULike is, it is fairly difficult to get an overall picture of . . . → Read More: Visualizing CiteULike collections
By Alexandre Rafalovitch, on January 13th, 2009 I am frustrated. I know my corpus (resolutions of the United Nations General Assembly) shares a lot in common with biomedical and legal domain. And I can find interesting articles in biomedical domain dealing with similar issues of complex tokenization, long named entity mentions (though mine are much longer), etc. But I see nothing in . . . → Read More: Where are all legal computational linguistics resources?
By Alexandre Rafalovitch, on April 19th, 2008 I have written about converting Microsoft Word files into text or html using OpenOffice before. However, the wizards I described in that article were crashing when the number of files crossed into several hundreds.
I have written some macros to do the conversion, but they were scary looking and fragile. Fortunately, I now found a . . . → Read More: Bulk converting doc files into txt (or html)
By Alexandre Rafalovitch, on March 16th, 2008 They say at BarCamp that if you don’t like the session you are in, feel free to go to a better one. No hard feelings. But what do you do, if you show up for the announced moderated discussion session yet the moderator does not.
That’s what happened to us with the last (5:15pm) slot . . . → Read More: Artificial Intelligence discussion at BarCampNYC3
By Alexandre Rafalovitch, on September 1st, 2007 Arthur C. Clarke once famously wrote “Any sufficiently advanced technology is indistinguishable from magic”. In the same vein, many people feel that any sufficiently established bureaucracy is like a black magic, sorcery even. Certainly, it often takes skills out of this world to follow the logic of modern tax return instructions.
Bureaucracy often has its . . . → Read More: Unravelling the black magic of bureaucracy
|
|