Checking examples in “Solr Indexing” with Solr 4.7 under Windows – part 1

It’s been 9 months since my introductory Solr book came out. It was written for version 4.3. In the meanwhile, Solr kept marching on and is now at version 4.7. There has been quite a number of changes and new features. So I really wanted to recheck that the examples in the book still make sense.  I also wanted to do the tests on Windows to see whether the *nix-centered instructions in the book caused any issues.

This first part covers the issues and supplementary material based on the review of the first five chapters. Later parts will be covered in other blog posts. So far, it seems that the examples survived without any serious issues.

Overall comments

  • Do not copy examples from the book PDF. Instead, use examples published on GitHub.  This will avoid problems with content broken over multiple lines and having to delete page-end material introduced in the PDF

  • Solr logs look somewhat funny in DOS console, especially with some weird û characters, but mostly they are ok. It may be worth increasing screen buffer size to see long exception traces.

  • Even though book PDF has clickable links enabled on URLs (that took FOREVER), some - though not all - line breaks cause URLs stop prematurely as well. So, if your browser results are very different from what book suggests, recheck that you have full URLs. Sometimes that will require some surgery if Solr redirects within Admin interface erroneously.

  • Command line examples often mix URLs and local directories. On *nix/Mac, these all use forward slash (/), but on Windows the URLs still use forward slashes and local paths use backslashes (). For example:

    java -Dauto -Durl=http://localhost:8983/solr/collection1/update -jar post.jar collection1</span>input1.csv

Chapter 1: Creating your first collection

  • In solrconfig.xml, we have Lucene version set to LUCENE_43. This still works, though the latest version is LUCENE_47 and is what I am testing with. I also noticed that now it seems possible to use version number directly (e.g. 4.7).

  • Solr WebAdmin screens have rather different collection-specific sub-screens. The ones book uses are all still there, but there has been quite a lot of change and progress overall. Some of which is related to now being able to edit schema and configuration via web UI.

  • Scripting Solr startup turned out to be slightly more complicated on Windows. Here is an example that works:

    SETLOCAL

    CD /D D:\SOLRPATH\solr-4.7.0\example

    java -Dsolr.solr.home=D:\SOLR-INDEXING -jar start.jar

    Here, SETLOCAL insures that the prompt returns to directory script was started from, instead of switching to the Solr’s example directory and staying there.

    Chapter 2: Running several collections at once

    As mentioned in the book, Solr now has core autodiscovery with radically different semantics. And as predicted, it took a couple of versions for it to mature. I think it works by now, but the book examples still work fine with legacy solr.xml format.

    Chapter 3: Importing multivalued fields

    Similar to scripting in chapter 1, both copying and deleting directories turned out to be slightly non-trivial in DOS. The relevant example commands are:

    xcopy /SI collection1 multivalued

    rmdir /s multivalued\data

    Chapter 4: Using Solr’s XML format

    If you practice delete commands at the end of the chapter, you need to rerun the import before moving to the next chapter. The full populated index is reused later.

    Chapter 5: Indexing text

    • Be careful with the step 1, as it combines several instructions (trying to save space). Do not delete data directory and make sure to restart the Solr after modifying solr.xml
    • Some of the examples have ‘?’, ‘;’, and ‘.’ characters stuck at the end of URLs. They don’t actually change the results (no match), but could be quite confusing, especially since the URLs also have URL encoding of values as well.
    • Be careful if you following this chapter by copying config files from the GitHub repository. The same several files are changed multiple times during the chapter and the GitHub represent only the final state. It might be better to copy selectively or even do this one mostly by typing.

    Summary

    It seems that the book has survived so far, at least for basic examples. And it works on Windows without too many changes. So, if you are a beginner or early intermediate Solr user, it is still a good value.


    If you enjoyed this article, you may also benefit from other information resources available at solr-start.com.