By arafalov, on August 20th, 2010
Image via Wikipedia
For my other project, I needed to process some Arabic text that was in HTML file derived from MSWord document.
Everything was going reasonably well, except my regular expressions were not picking section name/numbers sequences in all of the cases, which was causing a problem with the 6-language alignment algorithm.
Normally, I just examine the text . . . → Read More: Arabic numerals’ non-WYSIWYG
By arafalov, on August 12th, 2010
Image via Wikipedia
I saw an interesting question on StackOverflow on how to cycle between 3 states for list items , but with initial state for each item being potentially different.
This random start position part of the problem was making me think, so I used it as an exercise to try some newish jQuery functions, such as . . . → Read More: jQuery: Cycling between multiple classes with random start
By arafalov, on August 5th, 2010
I got myself a new digital camera recently, a Canon T2i. It feels really nice and makes it quite hard to go back to point-and-shoots afterward. And it takes really good 18 megapixel shots.
Here is one of the South African Warthog, displayed using Microsof’s Zoom.it technology. Try zooming . . . → Read More: A new camera
By arafalov, on October 31st, 2009
I have (nearly) finished developing a mini-website in 6 languages (Arabic, Chinese, English, French, Russian, Spanish). The layout was the same, so ideally it would have been driven by a content management system. Not in this case unfortunately, as I was not given enough time to setup the infrastructure.
As I know nearly nothing of at least . . . → Read More: jQuery for multilingual web development
By arafalov, on May 28th, 2009
I like ANTLR! It is a specialized tool that can really be applied to many difficult tasks when regular expressions get all Dust Puppy like. And I have used it in the past with great success.
But, every time I put this particular tool aside, I know that picking it back up will be like making up . . . → Read More: Making up with ANTLR
By arafalov, on March 26th, 2009
A recent article on lingpipe discussed conjuncted named entities such as Johnson and Johnson and Wallace and Gromit. They suggest that maybe a way of treating this is as a frozen expression. I assume that means relying on statistical measures to see this Multi-Word-Expression repeating enough times to be treated as a unit.
In the United Nations . . . → Read More: Conjunctions in named entities
By arafalov, on January 27th, 2009
Homegrown visualization is not the only way to quickly navigate CiteULike references. There are other tools that display bibliographies in interesting ways.
One of such tools is Exhibit, one of graduates from SIMILE project. It allows to do a very interactive webpage driven by just HTML+Javascript, with no server-side component required. I really like SIMILE’s tools, even . . . → Read More: CiteULike Exhibit visualization
By arafalov, on January 25th, 2009
I am collecting my reading and reference material in CiteULike. I like the service because it can capture details from multiple sources. It also allows to discover what was collected by other interesting people through tags, people and bookmarks graph navigation.
Nice as CiteULike is, it is fairly difficult to get an overall picture of one’s own . . . → Read More: Visualizing CiteULike collections
By arafalov, on January 17th, 2009
Dr. René Witte has just created a new mailing list (SENLP) to discuss applying NLP techniques to Software Engineering and also to discuss general Software Engineering issues in developing NLP systems.
I am interested in both topics. I did 3 years as senior technical support at BEA and could see how applying NLP techniques on written notes . . . → Read More: New mailing list to discuss junction of NLP and Software Engineering
By arafalov, on January 13th, 2009
I am frustrated. I know my corpus (resolutions of the United Nations General Assembly) shares a lot in common with biomedical and legal domain. And I can find interesting articles in biomedical domain dealing with similar issues of complex tokenization, long named entity mentions (though mine are much longer), etc. But I see nothing in legal . . . → Read More: Where are all legal computational linguistics resources?
|
|
|