My book on Solr is now published

It is a great moment. After many months of work, my book is finally published and is available from multiple sources. It is called Instant Apache Solr for Indexing Data How-to and it has been published by the Packt Publishing.

There is a number of books published on Solr, but I feel that mine is different. Most of the books try to cover as much of Solr as possible and have a reference-style approach to explaining what different Solr components do. This is useful but - because Solr is so large - it is easy to get over-saturated with all the information and still have no idea of how to put a good Solr setup together.

My book focuses specifically on beginners and early intermediates and starts after the Solr’s official tutorial. The tutorial is actually quite good. The distribution comes with several collections and the tutorial shows learners how to get a collection running very quickly and how to get some documents indexed. It then walks through many of Solr’s useful features. So, naturally, when a beginner is doing a proof of concept, they take the provided example distribution and modify it until something works. They fact that there are many dangling pieces and unused field definitions is not too important for the proof of concept. Sometimes, this configuration even makes it into production.

However, the problems start when that example-based schema needs to be cleaned-up or a new production-ready schema needs to be written.  Between schema.xml, solrconfig.xml and solr.xml, things suddenly start looking complicated. So, people go looking for the simplest possible set of configuration files. Which are not too hard to find. What is hard, is to figure out how to get from those to a clean version of something that worked in the proof of concept.

This was my story and the story I keep seeing on Solr Users mailing list and in many of Solr questions on Stack Overflow. So, my book is designed to fix this problem.

The book focuses on indexing. It does not go into the details of facets or eDismax or clustering or production setup or many, many other things. It mentions and uses those aspects of Solr wherever relevant to indexing and testing that indexing worked. The main reasoning behind it is that no searching is possible until the data is in Solr. Therefore, indexing is the first piece of the puzzle that people need to understand - for real use - before they can proceed to the other aspects of Solr. And, later, when they discover new search requirements, they will need to go back and change the way indexing is done to address those requirements. So, understanding indexing is important then as well.

The book is written as a sequence of examples which - mostly - build on top of each other. So, it should be really read in a linear fashion. The first example introduces  a basic Solr schema and corresponding minimal solrconfig.xml and solr.xml. Some data (pretend emails) gets indexed. Then, other examples introduce additional requirements and show how Solr can be configured to fulfill them. In this way, it is not just features that are being described, but also the larger picture of how those features can be put together. Fairly quickly, the book builds up to the latest features present in Solr 4.3, such as DataImportHandler, Near-real-time indexing and multi-lingual content handling (English and Russian) with UpdateRequestProcessors and  field aliasing.

By the end of the book, reader still does not know everything that there is to know about indexing. Even with such narrow focus, Solr is too deep. Especially, if one starts looking at Solr as part of ecosystems, such as Drupal, Cloudera or Blacklight. Rather, reader should now be able to understand what they still need to learn and be much more effective in discovering that knowledge on their own.

So, if you are a Solr beginner, have progressed beyond ‘collection1’ or even just want to quickly understand what can be indexed with Solr, buy this book. And if you are still not fully convinced, look at the configurations for the examples that come with the book. Or read Mark Bennett’s review of it. You can even glimpse at the chapter starts at Safari online.

Update: The first chapter of the book is now also available as an article.

Update (August 3rd, 2013): Mauricio Scheffer, author of SolrNet (Solr client for .Net), has published a review of the book as well.

_Update (March, 2014): _Florian Hopf has published a short review of the book as well.


If you enjoyed this article, you may also benefit from other information resources available at solr-start.com.