Weird Stuff

Subscribe to this category

permalink trackback comments feed

No, I am not complaining about food. I love food. And I love it different and - sometimes even - adventurous. Which is where Sour Grapes come in.

We have been walking around the neighborhood and have discovered Middle Eastern shop with some unusual but recognizable foods and some not-quite-recognizable ones. Our strategy with the later is usually to buy it and then google for its name/recipe. Usually it works quite well.

Not so, with the “Sour Grape Ground”, which looks like brown sand-like stuff with some white flecks. And it is impossible to find on the internet.

The problem of course is that “sour grape” means complaining, and there is a lot of that on the internet.

So, in the end, we just had to go from the general description. We had it in a stew and it was nice, though not outstanding. This probably has more to do with us not knowing the proportions, than with with the sour grapes themselves.

permalink trackback comments feed

Watching the 21 Accents video (via Neatorama)  made me think that different languages/accents seemed to require different mouth positions. Is it possible that some phonemes are only achievable with the mouth wide-stretched in a smile? Then loading a language with such phonemes would be one way to ensure people appear friendly to strangers whatever the real mood of the person.

Social engineering through accent shaping, now that is an interesting thought!

permalink trackback comments feed

While reading weka Data Mining book, I have come across this impressive example of using machine learning to confirm person’s authorship (p. 358).

In 19th century, there lived a famous rabbinic scholar Ben Ish Chai, who among other writings had two collections of letters. Ben Ish Chai claimed that only one collection was his and that the other one was somebody else’s, found by him. Modern scholars thought both collections were his, but could not prove it conclusively as the style of writing was different.

Machine Learning to the rescue! In 2004, Moshe Koppel and Jonathan Schler have discovered that it may help to look not at the writing style differences (as the style may have been faked), but rather at how deep those differences were. For example, an author could fake a stylistic mismatch by consciously avoiding favorite words, but would still write in long overrun sentences, use more of passive verb forms or display many other measurable behaviours.

So, if the most obvious differences were removed one by one, the speed at which the rest of the features would look identical could be a good indicator. They called this technique unmasking and the mistery of Ben Ish Chai was solved for good.

I think what impressed me here was not the clever math. The whole field of determining authorship is based on clever math. It is rather the fact that the math was looking at hints within the hints of the language - the invisible aspects that become noticeable only after the eye learns to see beyond what the most obvious reality offers. I cannot explain it better, but to me it has a special elegance that just counting the words and sentence lengths does not offer.

permalink trackback comments feed

What could be common between Computational Linguistics and Aerobics? Quite a lot, as it turns out to be.

Dance descriptions, while not really in English do have a regular structure and can be thought of as a sub-language with full set of syntactic, semantic and pragmatic levels.

There are basic words of the language (move names), correct ways of putting them in a sentence (a routine) and all the way up to good flowing text (classes that do not hurt the participants).

I was thinking about relationship between dance instructions and computational linguistics in context of Scottish Country Dancing for at least a year. My imagined benefits were that codified dance instructions would allow for automatic dance animations, superior teacher aids and other applications that currently require a lot of sweat and toil. Dance evening programmes that are currently put together manually for each event, could be assisted with automated evaluation pointing out awkward sequences of dances.

Unfortunately, my attempts at explaining the connection made no sense to the people around me. So, I was ecstatic to discover that such a link was already discovered by others before me.

Adam Bull, more than 10 years ago, has tried to apply principles of computational linguistics to Aerobics for his MPhil degree in the paper entitled The formal description of aerobic dance exercise - a corpus-based computational linguistics approach. While, the report is not complete, it puts down many of the same arguments I have tried myself.

Unfortunately, the electronic copy of the document was not available. After some effort, I got in touch with Adam and he send me the copy of the report with the permission to distribute. I have put a copy of it on my own server.

I hope his research will get rediscovered and improved upon. That way when I get some time to apply my own PhD skills to Scottish Country Dancing, there will be more than one person on whose shoulders I would be able to stand.

permalink trackback comments feed

I just found my own oldest webpage (handcoded) and my oldest public source code (Java) at once. Archive.org - that has hosted this long-dead memory since 1999 - is just so great.

Looking back at it, I realise that I was right in the thick of Internet development:

  • When I just started working with Java, we had to throw out all the printed Javadocs, because jdk1.0b2 was released and a lot of Java API (e.g. FTP and MAIL) from jdk1.0a3 has been hidden under sun’s internal packages
  • I did a first (alpha) implementation of standard servlet API for W3C’s Jigsaw server, by porting it from Sun’s Jeeves
  • I dabbled in hot 2.5D Apple technology (HotSause), by generating web server’s directory content in MCF format. The format has died, but apparently it turned into RDF. I was developing Semantic Web applications well before the term got popular.
  • I contributed to an Open Source project, well before SourceForge’s first appearance
  • I was a late-comer to /. and my ID is still below 36000

I am not bragging! I am just musing out loud at how much personal web history can be retrieved with few well placed searches.

The flip side of a coin of course, is that this history will not go away, even if I wanted it to. Which is why I do not link to my Slashdot account (and this is not an invitation for exercise in forensics). One just hopes that the future recruiter will look at timestamps of my various web appearances and makes appropriate adjustments to skills and effort.