<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Outer Thoughts &#187; Problems and Solutions</title>
	<atom:link href="http://blog.outerthoughts.com/category/problems-and-solutions/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.outerthoughts.com</link>
	<description>&#62; From inner thoughts to the outer limits of Alexandre Rafalovitch</description>
	<lastBuildDate>Wed, 27 Jul 2011 00:24:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.4</generator>
		<item>
		<title>Apple&#8217;s Catch-22 of moving countries</title>
		<link>http://blog.outerthoughts.com/2011/07/apples-catch-22-of-moving-countries/</link>
		<comments>http://blog.outerthoughts.com/2011/07/apples-catch-22-of-moving-countries/#comments</comments>
		<pubDate>Sun, 17 Jul 2011 03:50:19 +0000</pubDate>
		<dc:creator>arafalov</dc:creator>
				<category><![CDATA[Problems and Solutions]]></category>

		<guid isPermaLink="false">http://blog.outerthoughts.com/?p=422</guid>
		<description><![CDATA[<p>I moved from USA to Canada with my iPad (among other things). I have a bunch of iPad apps and I keep buying new ones. Not any more. Since I updated my banking details, my iTunes account stopped working.</p> I cannot pay for apps with my USA iTunes account, since the address for the Credit <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.outerthoughts.com/2011/07/apples-catch-22-of-moving-countries/">Apple&#8217;s Catch-22 of moving countries</a></span>]]></description>
			<content:encoded><![CDATA[<p>I moved from USA to Canada with my iPad (among other things). I have a bunch of iPad apps and I keep buying new ones. Not any more. Since I updated my banking details, my iTunes account stopped working.</p>
<ol>
<li>I cannot pay for apps with my USA iTunes account, since the address for the Credit Card is in Canada</li>
<li>I cannot change the account&#8217;s country, because I have left over funds</li>
<li>I cannot spend left-over funds because they are too small and too uneven to buy anything or to create gift certificates</li>
<li>I cannot pad left-over funds  as I would need a Credit Card for that (see point 1)</li>
</ol>
<p>I could somehow buy USA iTunes gift card and add that money to my account, but it is not a sustainable practice and feels like being a hostage. I could buy a pre-paid Visa card that allows to put in any (including USA) address, but that has expensive overhead.</p>
<p>Or I could email Apple support and ask them what they can do about it. Which is what I&#8217;ve done. But it does not feel nice.</p>
<p>(Update) &#8216;Nice&#8217; of Apple to break the Catch-22 by relieving me of my funds:</p>
<blockquote><p>Alexandre, as the store credit was less than the minimum possible purchase I have removed the store credit from your account. &#8230;  Please note that the funds will not be returned to you once you have switched countries.</p></blockquote>
<p>I guess you could call it a service fee.</p>
<p>(Update 2) And it still does not work. Pressing the &#8220;Change country&#8221; button throws an error of &#8220;iTunes store is busy&#8221;. You can press other buttons and things work, but once you press &#8220;Change country&#8221;, all the buttons stop working. Well, I guess Apple does not want any more of my money. And perhaps, just perhaps, I will not bother with iPad 3, if this does not get solved at all. It is funny how the little things may have big influences.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.outerthoughts.com/2011/07/apples-catch-22-of-moving-countries/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Bulk processing Lotus Notes database</title>
		<link>http://blog.outerthoughts.com/2011/01/bulk-processing-lotus-notes-database/</link>
		<comments>http://blog.outerthoughts.com/2011/01/bulk-processing-lotus-notes-database/#comments</comments>
		<pubDate>Fri, 21 Jan 2011 01:07:01 +0000</pubDate>
		<dc:creator>arafalov</dc:creator>
				<category><![CDATA[Problems and Solutions]]></category>

		<guid isPermaLink="false">http://blog.outerthoughts.com/?p=396</guid>
		<description><![CDATA[<p>This article is for a niche audience even smaller than my usual readers. There are not that many Lotus Notes developers; even smaller is a number of Lotus Notes coders who have bulk integration/migration needs. But some use cases do exist.</p> <p>I have started (and probably finished) a small GitHub project that demonstrates how to <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.outerthoughts.com/2011/01/bulk-processing-lotus-notes-database/">Bulk processing Lotus Notes database</a></span>]]></description>
			<content:encoded><![CDATA[<p>This article is for a niche audience even smaller than my usual readers. <img src='http://blog.outerthoughts.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  There are not that many Lotus Notes developers; even smaller is a number of Lotus Notes coders who have bulk integration/migration needs. But some use cases <a href="http://www.notesonproductivity.com/ICA/NOP.nsf/dx/moving-to-wordpress-from-domino-advice">do exist</a>.</p>
<p>I have started (and probably finished) <a title="Project demonstrating how to export and process Lotus Notes databases" href="https://github.com/arafalov/Lotus-Notes-Exporter">a small GitHub project</a> that demonstrates how to export Lotus Notes into an XML format and then &#8211; as an example app &#8211; how to extract external links from it for link checking or other purposes.</p>
<p>There are two specific pieces of advice in there, that was learned the hard way:</p>
<ol>
<li>If you have access to a Lotus Notes database, you can export its content: text, embedded multimedia, history, permissions and all. And it does not take long, I had a two gigabyte database exporting in 20 minutes or so.</li>
<li>Big XML files are hard to process. Usually streaming-oriented processing is a way to deal with it, but Lotus Notes XML is too ugly to build a streaming state machine around. It is easy to use XPath, but streaming normally does not support that and a full in-memory DOM is too large. Fortunately, <a title="Java XML processing library" href="http://www.xom.nu/">XOM</a> is a Java library that gives you the perfect combination. You create a custom Node factory class and it gets called for each element. Two important points: you can through the element away, effectively having a streaming mode and when you get the element, you can run XPath queries against it. I found Lotus Notes&#8217;s <em>document</em> element to be perfect to run my processing on. Each <em>document</em> is self-contained, so I process it and through it away.</li>
</ol>
<p>I have used this approach for a number of small projects. I have done analytics (such as dead document cross-references), management reports (based on update history) and format conversion. I have also migrated Lotus Notes contact database into external mailing lists.</p>
<p>I did not enjoy Lotus Notes programming, but I did enjoy this particular toolkit and approach. It felt more than a hammer looking for a nail; it felt like a whole chainsaw willing &#8211; not just able &#8211; to rip into the meaty innards of Lotus Notes database and carve it into useful pieces.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.outerthoughts.com/2011/01/bulk-processing-lotus-notes-database/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Injecting BidiChecker to test Arabic web pages</title>
		<link>http://blog.outerthoughts.com/2010/11/injecting-bidichecker-to-test-arabic-web-pages/</link>
		<comments>http://blog.outerthoughts.com/2010/11/injecting-bidichecker-to-test-arabic-web-pages/#comments</comments>
		<pubDate>Thu, 04 Nov 2010 14:12:11 +0000</pubDate>
		<dc:creator>arafalov</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[jQuery]]></category>
		<category><![CDATA[Problems and Solutions]]></category>

		<guid isPermaLink="false">http://blog.outerthoughts.com/?p=376</guid>
		<description><![CDATA[<p>Google has just announced the release of BidiChecker &#8211; an open source tool to automatically test Arabic web pages for issues related to bidirectional support. This is a great news, as bidirectional support is always a huge problem and requires both deep Arabic language understanding and deep technical HTML/CSS understanding, preferably at the same time. <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.outerthoughts.com/2010/11/injecting-bidichecker-to-test-arabic-web-pages/">Injecting BidiChecker to test Arabic web pages</a></span>]]></description>
			<content:encoded><![CDATA[<p>Google has just <a href="http://google-opensource.blogspot.com/2010/11/test-your-app-from-right-to-left.html">announced</a> the release of <a href="http://code.google.com/p/bidichecker/">BidiChecker</a> &#8211; an open source tool to automatically test Arabic web pages for issues related to bidirectional support. This is a great news, as bidirectional support is always a huge problem and requires both deep Arabic language understanding and deep technical HTML/CSS understanding, preferably at the same time. Any level of automation would be useful.</p>
<p>However, all the tool usage descriptions are geared towards using it with automated JavaScript testing library. I just wanted to test the tool on a couple of public web pages, both ones we maintain and others.</p>
<p>Here is the sequence of steps to inject and trigger BidiChecker into any Arabic page using <a href="http://jquery.com/">jQuery</a> and Firefox+<a href="http://getfirebug.com/">Firebug</a>:</p>
<ol>
<li>Load the page in the Firefox with Firebug console enabled.</li>
<li>Inject jQuery if you don&#8217;t have it on the page already. I use <a href="http://www.learningjquery.com/2009/04/better-stronger-safer-jquerify-bookmarklet">jQuerify</a> bookmarklet</li>
<li>In Firebug&#8217;s console paste the following code:
<pre>
jQuery.getScript(
     "http://bidichecker.googlecode.com/svn/trunk/lib/bidichecker_packaged.js",
     function(){
         var bidiErrors = bidichecker.checkPage(true, top.document.body);
         bidichecker.runGui(bidiErrors);
});</pre>
</li>
<li> If there are any BiDi issues, the BidiChecker&#8217;s error navigation window will pop up and the offending text will be shown in red in the page itself. If you used jQuerify plugin, you may get one for the &#8220;jQuery injected&#8221; message itself.</li>
</ol>
<p>Loading BidiChecker library directly from Google Code&#8217;s SVN is probably not a particularly polite way of doing it, but it is great for a quick test and introduction to the tool.</p>
<p>It is also possible to use <a href="http://chris.improbable.org/2010/11/4/google-bidichecker-bookmarklet/">a standalone BidiChecker bookmarklet</a>, but I prefer Firebug approach as it then lets me to explore the page further using Code Inspector and other tools.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.outerthoughts.com/2010/11/injecting-bidichecker-to-test-arabic-web-pages/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Arabic numerals&#8217; non-WYSIWYG</title>
		<link>http://blog.outerthoughts.com/2010/08/arabic-numerals-non-wysiwyg/</link>
		<comments>http://blog.outerthoughts.com/2010/08/arabic-numerals-non-wysiwyg/#comments</comments>
		<pubDate>Fri, 20 Aug 2010 05:12:23 +0000</pubDate>
		<dc:creator>arafalov</dc:creator>
				<category><![CDATA[Problems and Solutions]]></category>
		<category><![CDATA[Weird Stuff]]></category>

		<guid isPermaLink="false">http://blog.outerthoughts.com/?p=359</guid>
		<description><![CDATA[ Image via Wikipedia <p>For my other project, I needed to process some Arabic text that was in HTML file derived from MSWord document.</p> <p>Everything was going reasonably well, except my regular expressions were not picking section name/numbers sequences in all of the cases, which was causing a problem with the 6-language alignment algorithm.</p> <p>Normally, <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.outerthoughts.com/2010/08/arabic-numerals-non-wysiwyg/">Arabic numerals&#8217; non-WYSIWYG</a></span>]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img" style="margin: 1em; display: block;">
<div>
<dl class="wp-caption alignright" style="width: 310px;">
<dt class="wp-caption-dt"><a href="http://commons.wikipedia.org/wiki/File:EgyptphoneKeypad.jpg"><img title="I made this photo myself. Its now in the Publi..." src="http://upload.wikimedia.org/wikipedia/commons/thumb/d/d8/EgyptphoneKeypad.jpg/300px-EgyptphoneKeypad.jpg" alt="I made this photo myself. Its now in the Publi..." width="300" height="298" /></a></dt>
<dd class="wp-caption-dd zemanta-img-attribution" style="font-size: 0.8em;">Image via <a href="http://commons.wikipedia.org/wiki/File:EgyptphoneKeypad.jpg">Wikipedia</a></dd>
</dl>
</div>
</div>
<p>For <a title="UN Corpora project website" href="http://www.uncorpora.org/">my other project</a>, I needed to process some Arabic text that was in HTML file derived from MSWord document.</p>
<p>Everything was going reasonably well, except my regular expressions were not picking section name/numbers sequences in all of the cases, which was causing a problem with the 6-language alignment algorithm.</p>
<p>Normally, I just examine the text visually, determine a new regular expression pattern and that particular problem is solved. This time it was not to be.</p>
<p>When I looked at the text what I saw was the phrase &#8220;<big><strong>Section 1٣</strong></big>&#8221; with the word Section written in Arabic (right-to-left of course). The problem here is <big><strong>1٣</strong></big> which means 13, but with first digit 1 coming from <a href="http://en.wikipedia.org/wiki/Arabic_numerals">Arabic Numerals</a> set (which is what we use in English language) and the second digit <big><strong>٣ </strong></big>(3) coming from <a href="http://en.wikipedia.org/wiki/Eastern_Arabic_numerals">Arabic-Indic Numerals</a> set (which is what at least some Arab countries use). Confusing, I know. We use their numbers and</p>
<p>they already use somebody else&#8217;s. What do they know that we haven&#8217;t yet figured out?</p>
<p>Of course this juxtaposition makes no sense. Why would somebody mix the two alphabets, especially in an official document. I contacted the authoring departments and &#8211; unbelievably to me &#8211; they looked at the document and it was looking correct to them.</p>
<p>I had nothing to go on with, so I left that puzzle unsolved for a couple of weeks. That is until it hit me &#8211; they were looking at it in the MSWord, while I was looking at it on the codepoint character level. They had <a class="zem_slink" title="WYSIWYG" rel="wikipedia" href="http://en.wikipedia.org/wiki/WYSIWYG">WYSIWYG</a> on and I did not. So that was the difference.</p>
<p>I went looking around the MSWord interface with Arabic enabled and sure enough there was <a title="Microsoft's documentation on Arabic support in MSWord" href="http://www.microsoft.com/middleeast/arabicdev/office/officeXP/wPapers/Word.aspx#_Toc15640940">a whole collection of options for Arabic fonts, numbers and other options</a>. And one of them was to display all numbers as Arabic-Indic. So, when that mode is enabled, MSWord will display any digits as Arabic-Indic ones. That answered half of the puzzle of why the original authors could not see the difference. But how did that happen in first place?</p>
<p>My guess is that the original section was copied from somewhere else in the document. The person who worked on that original had the keyboard (not MSWord display) configured to use Arabic numbers and was actually entering all too familiar 1,2,3 but displaying them as <big><strong>١,٢,٣</strong></big>. Then, the person who copied the section title had a keyboard configured to use Arabic-Indic characters and he/she replaced or added to the section number using her keyboard. It still displayed cohesively, but now had numbers from different numeric systems.</p>
<p>Of course since the documents were designed for printing nobody noticed and really had no reason to care. This issue only becomes important when those documents are used as <em>input</em> for bitext alignment or some other computational processing. Then, and only then, it bites the person trying to make sense out of it.</p>
<p>The lesson here is. WYSIWYG might be good if all you are doing is looking or printing. But if your documents serve as input to other processes as well, WYSIWYG can cause some very non-obvious issues.</p>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Enhanced by Zemanta" href="http://www.zemanta.com/"><img class="zemanta-pixie-img" style="border: medium none; float: right;" src="http://img.zemanta.com/zemified_e.png?x-id=6f0a35ff-ec35-43e1-b628-a1f85b671e0f" alt="Enhanced by Zemanta" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://blog.outerthoughts.com/2010/08/arabic-numerals-non-wysiwyg/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Making up with ANTLR</title>
		<link>http://blog.outerthoughts.com/2009/05/making-up-with-antlr/</link>
		<comments>http://blog.outerthoughts.com/2009/05/making-up-with-antlr/#comments</comments>
		<pubDate>Fri, 29 May 2009 02:25:16 +0000</pubDate>
		<dc:creator>arafalov</dc:creator>
				<category><![CDATA[My PhD research]]></category>
		<category><![CDATA[Problems and Solutions]]></category>

		<guid isPermaLink="false">http://blog.outerthoughts.com/?p=275</guid>
		<description><![CDATA[<p>I like ANTLR! It is a specialized tool that can really be applied to many difficult tasks when regular expressions get all Dust Puppy like. And I have used it in the past with great success.</p> <p>But, every time I put this particular tool aside, I know that picking it back up will be like <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.outerthoughts.com/2009/05/making-up-with-antlr/">Making up with ANTLR</a></span>]]></description>
			<content:encoded><![CDATA[<p>I like <a title="ANTLR's home page" href="http://antlr.org/">ANTLR</a>! It is a specialized tool that can really be applied to many difficult tasks when regular expressions get all <a title="Explanation of Dust Puppy" href="http://www.userfriendly.org/cartoons/dustpuppy/">Dust Puppy</a> like. And I have used it in the past with great success.</p>
<p>But, every time I put this particular tool aside, I know that picking it back up will be like making up after a bad break up. Things feel familiar, but you are still so uncomfortable you cannot get anything working. Only knowing how great the tool is underneath, makes me go through the effort of re-familiarization.</p>
<p>I just downloaded ANTLR 3.1.2 bundled with its own GUI ANTLRWorks that offers visual diagrams, debugger and templates. You would think that would make for an easy out-of-box experience. You would be wrong.</p>
<p>You start the GUI and end up facing a blank screen. Lots of options and tabs for sure, but the only easy start one seems to be &#8216;Insert rule from template&#8217;.</p>
<p>Ok, so here is a couple of rules from templates trying to parse &#8220;Hello World!&#8221; string:</p>
<blockquote><p>ID    :    LETTER (LETTER | DIGIT)*<br />
;<br />
LETTER<br />
:    &#8216;a&#8217;..&#8217;z&#8217; | &#8216;A&#8217;..&#8217;Z&#8217;<br />
;</p>
<p>DIGIT    :    &#8217;0&#8242;..&#8217;9&#8242;<br />
;</p>
<p>WS    :    (&#8216; &#8216; | &#8216;\t&#8217; | &#8216;\n&#8217; | &#8216;\r&#8217;) { $setType(Token.SKIP); }<br />
;</p></blockquote>
<p>Not good. We are missing a start state apparently. Ok, let&#8217;s add one:</p>
<blockquote><p>hello    :    ID ID &#8216;!&#8217;<br />
;</p></blockquote>
<p>Still no good. Start looking at examples, trying to see what bits are compulsory. Ok, the word grammar is missing at the top of the file. Of course, I have both grammar and lexer elements now in one file (ANTLR 3 feature, I believe), but let&#8217;s not worry about deep meaning here.</p>
<blockquote><p>grammar test;</p></blockquote>
<p>Now, suddenly, syntax diagram starts showing up. Let&#8217;s try saving (as test.g) and compiling. No good:</p>
<blockquote><p>The following token definitions can never be matched because prior tokens match the same input: LETTER</p></blockquote>
<p>So much for following a template. More digging in examples. Memory really starts to bring back the <a title="Seminal book on Compiler technologies" href="http://dragonbook.stanford.edu/">Dragon Book</a>&#8216;s lessons. What&#8217;s the problem with LETTER and who is the <em>prior token</em> here. Ah, we don&#8217;t want the lexer to return LETTER (or DIGIT), only ID. So, LETTER and DIGIT are both token fragments, not tokens. Add <em>fragment</em> in front of both definitions. All good?</p>
<p>Nope! Now we have a problem with:</p>
<blockquote><p>attribute is not a token, parameter, or return value: setType</p></blockquote>
<p>But I did not write <em>setType</em>, the template provided it! Back to the examples! Apparently, somewhere along the way Skip tokens have gone away and we now have hidden channels instead. Swap that bit with one from an example and try again.</p>
<p>SUCCESS. Switch to interpreter, enter &#8220;Hello World!&#8221; in input box and run <em>hello</em> rule. Beauty, we have a parse diagram.</p>
<p>The final running grammar example is here:</p>
<blockquote><p>grammar test;</p>
<p>hello    :    ID ID &#8216;!&#8217;<br />
;</p>
<p>ID    :    LETTER (LETTER | DIGIT)*<br />
;<br />
fragment LETTER<br />
:    &#8216;a&#8217;..&#8217;z&#8217; | &#8216;A&#8217;..&#8217;Z&#8217;<br />
;</p>
<p>fragment DIGIT    :    &#8217;0&#8242;..&#8217;9&#8242;<br />
;</p>
<p>WS    :    (&#8216; &#8216; | &#8216;\t&#8217; | &#8216;\n&#8217; | &#8216;\r&#8217;) {  $channel = HIDDEN;  }<br />
;</p></blockquote>
<p>Hello World! Now, on to the real grammar and (if things really, really work) GATE integration&#8230;..</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.outerthoughts.com/2009/05/making-up-with-antlr/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CiteULike Exhibit visualization</title>
		<link>http://blog.outerthoughts.com/2009/01/citeulike-exhibit-visualization/</link>
		<comments>http://blog.outerthoughts.com/2009/01/citeulike-exhibit-visualization/#comments</comments>
		<pubDate>Wed, 28 Jan 2009 00:57:22 +0000</pubDate>
		<dc:creator>arafalov</dc:creator>
				<category><![CDATA[Problems and Solutions]]></category>
		<category><![CDATA[bibliography]]></category>
		<category><![CDATA[CiteULike]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://blog.outerthoughts.com/?p=270</guid>
		<description><![CDATA[<p>Homegrown visualization is not the only way to quickly navigate CiteULike references. There are other tools that display bibliographies in interesting ways.</p> <p>One of such tools is Exhibit, one of graduates from SIMILE project. It allows to do a very interactive webpage driven by just HTML+Javascript, with no server-side component required. I really like SIMILE&#8217;s <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.outerthoughts.com/2009/01/citeulike-exhibit-visualization/">CiteULike Exhibit visualization</a></span>]]></description>
			<content:encoded><![CDATA[<p><a title="Previous article on visualizing CiteULike's bibliographies" href="http://blog.outerthoughts.com/2009/01/visualizing-citeulike-collections/">Homegrown visualization</a> is not the only way to quickly navigate CiteULike references. There are other tools that display bibliographies in interesting ways.</p>
<p>One of such tools is <a title="Exhibit and other ex-SIMILE tools" href="http://code.google.com/p/simile-widgets/">Exhibit</a>, one of graduates from <a title="SIMILE project's homepage" href="http://simile.mit.edu/">SIMILE</a> project. It allows to do a very interactive webpage driven by just HTML+Javascript, with no server-side component required. I really like SIMILE&#8217;s tools, even though it feels like development slowed somewhat recently.</p>
<p>There is <a href="http://simile.mit.edu/wiki/Exhibit/How_to_make_a_publications_exhibit">an example of how to import and display bibtext within Exhibit</a>. It is not difficult, just a couple of steps. It must have been a popular section, as there is now a dedicated new tool for it.</p>
<p><a title="Citeline Exhibit Builder" href="http://citeline.mit.edu/">Citeline Exhibit Builder</a> allows to load in bibtext and presents editing interface to customize Exhibit&#8217;s presentation of the publications. It looks great and seem to work well. A nice aspect is that it allows to chose which bibtext fields to expose as filter facets. With original tutorial that would require html editing and understanding Exhibit mindset. Citeline nicely hides user from it.</p>
<p>There was a couple of small problems. Apparently, there is a way to login and &#8216;claim&#8217; your presentation. I couldn&#8217;t test that as OpenID authentication failed (something about a nonce). Also, there is jsMath library but, once the generated Exhibit is downloaded, it fails with cross-server issues. Finally, as with most end-to-end solutions, it does not do data preprocessing/normalization to allow me, for example, to combine author/editor fields for sorting purposes.</p>
<p>Citeline is a very promising tool and I am certainly going to keep it in mind for publishing my bibliographies.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.outerthoughts.com/2009/01/citeulike-exhibit-visualization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Visualizing CiteULike collections</title>
		<link>http://blog.outerthoughts.com/2009/01/visualizing-citeulike-collections/</link>
		<comments>http://blog.outerthoughts.com/2009/01/visualizing-citeulike-collections/#comments</comments>
		<pubDate>Sun, 25 Jan 2009 07:10:20 +0000</pubDate>
		<dc:creator>arafalov</dc:creator>
				<category><![CDATA[Computational Linguistics]]></category>
		<category><![CDATA[My PhD research]]></category>
		<category><![CDATA[Problems and Solutions]]></category>
		<category><![CDATA[CiteULike]]></category>
		<category><![CDATA[Graphviz]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://blog.outerthoughts.com/?p=266</guid>
		<description><![CDATA[<p>I am collecting my reading and reference material in CiteULike. I like the service because it can capture details from multiple sources. It also allows to discover what was collected by other interesting people through tags, people and bookmarks graph navigation.</p> <p>Nice as CiteULike is, it is fairly difficult to get an overall picture of <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.outerthoughts.com/2009/01/visualizing-citeulike-collections/">Visualizing CiteULike collections</a></span>]]></description>
			<content:encoded><![CDATA[<p>I am collecting my reading and reference material in <a title="My library in CiteULike" href="http://www.citeulike.org/user/arafalov">CiteULike</a>. I like the service because it can capture details from multiple sources. It also allows to discover what was collected by other interesting people through tags, people and bookmarks graph navigation.</p>
<p>Nice as CiteULike is, it is fairly difficult to get an overall picture of one&#8217;s own collection. It is especially difficult to see quickly if there are people who serve as hubs by collaborating with multiple different groups. The information is there, but it requires a lot of clicks to find it out.</p>
<p>My usual solution is to export information out, massage it into <a title="Home page of Graphviz" href="http://www.graphviz.org/">Graphviz</a> format and use graph segmentation and layout algorithms to get a better overview. I <a title="Search for my articles mentioning Graphviz" href="http://blog.outerthoughts.com/?s=graphviz">have talked about Graphviz</a> a number of times on this blog before. This is yet another time it proved useful.</p>
<p>I started by exporting CiteULike&#8217;s content of my library. I found Endnote export format to be more structured and therefore easier to parse. I then run it through <a title="My converter" href="http://www.outerthoughts.com/files/paperviz/v1/convert.py">a custom Python program</a> that basically spat out graph with titles pointing at authors. That produced a <strong>very large</strong> graph and was not particularly useful.</p>
<p>The next step was to discover disjointed clusters of titles/authors. I used <em>ccomps</em> with -v and -x flags (e.g. <em>ccomps.exe -v -x -o comp.dot output.dot</em>).</p>
<p><em>ccomps</em> gave me partitioned graphs as well as statistics on number of nodes/edges in each graph. I could then choose a graph with large number of nodes/edges (eventually, all of them) and run it through <em>neato</em> with overlap=scale and splines=true (e.g. <em>neato.exe -Tgif -o neato_1.gif -Goverlap=scale -Gsplines=true comp_1.dot</em>).</p>
<p>The resulting graph was still not perfect, but it was a good start. I also tried <em>fdp</em> instead of <em>neato</em>, but that seemed to produce giraffe versions of the graph with graph edges being overly long.</p>
<p>You can see <a title="Output image of one of the clusters" href="http://www.outerthoughts.com/files/paperviz/v1/neato_1.gif">an example</a> of <em>neato</em> output for one of my clusters. Warning: if it causes problems due to its size, try it with <a title="Graphics viewing freeware" href="http://www.irfanview.com/">IrfanView</a>; that program can display even improbably large graphs (e.g. unpartitioned ones).</p>
<p>I have run into some problems as well that would either cause partitions combine together or produce duplicate nodes and edges.</p>
<p>The first problem was that sometimes a person was an author and sometimes an editor. I was interested in both, so collapsed those fields together. That caused some non-people to then show up on the graph and connect clusters in unexpected ways. For my library the specific value was &#8216;European&#8217;, so I filtered it out in the code.</p>
<p>The second problem was to do with CiteULike&#8217;s parsing. Sometimes, it would split a first+last name into separate names, probably due to incorrect manual entry at some point. I had to fix those at the source by editing corresponding CiteULike entry. Probably a good thing to do anyway.</p>
<p>The other problem is right out of the co-reference resolution domain. Sometimes names would include full first names, sometimes only a first name initial. I have worked around that by normalizing all first names to the initials. Obviously, this could collapse entries belonging to multiple real people into one.</p>
<p>Further on name problems, in cases of non English names (e.g. Spanish names with multiple surnames), CiteULike would get confused which part is which and not display or export it correctly. Additionally, sometimes characters such as <strong>ñ</strong> would be entered as plain <strong>n</strong>. Those also needed to be corrected manually.</p>
<p>The project only took a couple of hours including writing code and cleanup. It is already useful to me, as I found a new person who was in unexpectedly large number of papers and also found a chain of connections that might be interesting to follow more closely.</p>
<p>There is of course a lot more that could be done. Automatic co-reference of misspelt names, layout hints based on number of times authors appeared together, color coding of tags &#8211; these are just some of the easy ideas.</p>
<p>There might even be a small project/paper in doing co-reference resolution and cleaning up CiteULike data? After all, similar projects were done for Wikipedia. I don&#8217;t think CiteULike currently makes a full export available, but they do have <a title="CiteULike's datasets available for research" href="http://www.citeulike.org/faq/data.adp">some</a> so might be amendable to exporting a special set for research purposes.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.outerthoughts.com/2009/01/visualizing-citeulike-collections/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Vista repeatedly dropping wireless connection &#8211; solution</title>
		<link>http://blog.outerthoughts.com/2008/06/vista-repeatedly-dropping-wireless-connection-solution/</link>
		<comments>http://blog.outerthoughts.com/2008/06/vista-repeatedly-dropping-wireless-connection-solution/#comments</comments>
		<pubDate>Sat, 14 Jun 2008 12:44:15 +0000</pubDate>
		<dc:creator>arafalov</dc:creator>
				<category><![CDATA[Problems and Solutions]]></category>

		<guid isPermaLink="false">http://blog.outerthoughts.com/2008/06/vista-repeatedly-dropping-wireless-connection-solution/</guid>
		<description><![CDATA[<p>I am visiting my parents and connect to their network via wireless router. My laptop, which is (still!) running Vista kept dropping wireless connection every couple of minutes and reconnecting again. Interestingly, the other computers connected to the same router had no problems.</p> <p>I could not figure out where to even start troubleshooting this issue, <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.outerthoughts.com/2008/06/vista-repeatedly-dropping-wireless-connection-solution/">Vista repeatedly dropping wireless connection &#8211; solution</a></span>]]></description>
			<content:encoded><![CDATA[<p>I am visiting my parents and connect to their network via wireless router. My laptop, which is (still!) running Vista kept dropping wireless connection every couple of minutes and reconnecting again. Interestingly, the other computers connected to the same router had no problems.</p>
<p>I could not figure out where to even start troubleshooting this issue, until I noticed that the problem only happens while I am running on battery and not when I am connected to the mains. Once I notice that, the solution was simple &#8211; power management module must have been too eager and turning off wireless after 30 seconds of inactivity. Given that I was trying to read emails or webpages, that would occur fairly regularly.</p>
<p>The fix is to go to the power-management control panel and adjust on-battery behaviour to match the full-power one. I am putting this out because an hour of searching for this problem online did not bring any result. I hope the next person to be flummoxed by this repeated connection loss will find my blog entry fast.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.outerthoughts.com/2008/06/vista-repeatedly-dropping-wireless-connection-solution/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

