Oh, solr home – where art thou

Ever started a Solr (5.x) with an example, stopped it and then could not figure out where that example actually lives? I certainly have.

This did not used to be a problem in Solr 4.x because you started the instance manually and were forced to know where your Solr home directory was (with solr.solr.home property). And they were all in the example directory out of the box anyway. But with Solr 5, we now have startup scripts, examples, and startup configurations which make things easier to get going, but may also introduce some confusion down the road.

So, this is a compilation of all the examples Solr 5.3 ships with, what configs they are using and where the startup scripts create homes for them. Plus a couple of weird related things.

Let’s start from starter configurations. Every Solr collection has one, and several examples use the same configuration.

Solr ships with three configurations, you can see the list if you run bin/solr create_core -help. Those configurations are:

  • data_driven_schema_configs
  • sample_techproducts_configs
  • basic_configs

Of those, data_driven_schema_configs is using the new managed schema approach and – on top of that – has the rest of the Schemaless mode configured. The other two configurations use the classic approach with manually-editable schema.xml.

For better or worse, data_driven_schema_configs is the default one when you create a core (as discussed later). It also demonstrates fancy new parameter overrides in the params.json file.

All of the example configurations live in the server/solr/configsets directory. So, if you want to check them out in their virgin form, that’s where they hide.

To run an example, which may do a combination of creating solr home directories, starting Solr instances, creating collections/cores, and even indexing some data, you run bin/solr start -e example_name command, where the example name is NOT one of those collections. Those example names include:

    • schemaless– which uses the data_driven_schema_configs
    • techproducts – which uses sample_techproducts_configs
    • cloud – which gives you a choice of any of the three configurations
    • dih – which does not use any at all as we will discuss below

Both schemaless and techproducts examples create a subdirectory with the corresponding name under the example directory. The solr home is actually in the solr directory within that. So, if you created your example with bin/solr start -e techproducts and want to run that example again later, your command will be bin/solr start -s example/techproducts/solr/ 

cloud example will actually create several Solr instances with their own homes, depending on the configuration you provided. They are all created under example/cloud and are called node1node2, etc. They, obviously, all run on different ports (plus Zookeeper), so recreating that running configuration after the initial run requires a lot of attention to the ports chosen or defaulted. Fortunately, the script prints them out on the screen as it goes along. So, the restart command for the second node under the default configuration would look like: bin/solr start -cloud -p 7574 -s “example/cloud/node2/solr” -z localhost:9983

For all three of the examples above, the sub-directories created under the example directory can be safely deleted when you are done playing with those examples or want to rebuild them from scratch.

Not so for the dih example. This one still follows the old layout and running the example just starts the instance with the example/example-DIH/solr directory. Which has 5 (FIVE!) cores pre-configured. One of which is – of course – called solr. Since, this example is not bootstrapped from any configuration, any changes you make to that instance are permanent. And if you delete the directory, you’ll have to go back to the downloaded archive to get a fresh copy.

So, that’s it for examples! But wait, there is more. Examples, of course, is not the only way you create a running Solr instance. You can just say bin/solr start and you will be running in no time! But running from where?

Well, turns out that apart from the configsets, server/solr directory also has a solr.xml file, which makes it a bona fide Solr home. So, that’s the home Solr has when it does not have an – explicit – home.

Which means you can now create collections there using create_core command. Which is handy if you want to check out basic_configs in non-cloud setting. So, as an example, bin/solr start followed by bin/solr create_core -d basic_configs -c basic_core will connect to the running server and create a core in the server/solr/basic_core/ directory (notice, no extra solr in the path).

But wait, there is still more.

There is a hidden films example in the example/films directory that builds on the default (data_driven_schema_configs) configuration and then modifies it using REST API to be able to actually load the files in. See the README.txt in the directory for the hairy, multi-line curl details.

There is also another hidden example in the example/files directory that comes with its own configuration directory, but without solr.xml to make it a full blown home. Follow the README.txt in that directory to see how to use the provided configuration directory to bootstrap a core with schemaless mode configuration and custom velocity templates to browse indexed rich-text documents.

And now for the weird parts:

Those of you with an eagle eye may have noticed that the example configurations are stored in the directory called configsets but that I do not call them configsets. That’s because both example and core creation commands clone those directories. And THAT is because the true configsets are shared between all the instances that use them. So, modifying them in one core will affect all the other sibling cores. Which is not what you want for a bunch of schemaless examples.

If you want to see a true configset in use, execute the following command against the default server: curl “http://localhost:8983/solr/admin/cores?action=CREATE&name=configsettest&instanceDir=configsetTestDir&configSet=basic_configs” Now check out the content of the server/solr/configsetTestDir directory. Notice it does not even have a conf subdirectory, just the data one. It refers instead to its parent configset in the core.properties file: configSet=basic_configs.

Similarly, those of you who are used to the well-commented schema.xml may find themselves slightly lost looking at the managed-schema (nee schema.xml) of the default schemaless example which looks like something forcefully adopted by the Borg. Actually, it will look just like normal schema.xml, comments and all, until the first moment you actually manage it by either REST API or by submitting content that causes new fields to be added to the schema. Then, it will be assimilated, normalized and spat back out in the new form.

Finally, when you are done with the core, you can get rid of it in a three different ways:

  1. You can delete the whole directory manually, while Solr is not looking. Which works for standalone instances, but not so well for the cloud-ones
  2. You can run bin/solr delete -c corename command, which will do the same kind of permanent deletion from the running instance
  3. Or you can unload the core via the Core Admin screen of the Admin Web UI; this – just to be consistent – does not actually delete the core directory, it just renames the core.properties to core.properties.unloaded and you need to manually rename it back if you changed your mind. Or, run an Add Core command which will create a new core.properties file, completely ignoring the old renamed one and all the properties you may have had in it (such as configset name).

Hopefully, this comprehensive guide will make it easier to understand what’s going on with the Solr 5.x behind the scenes and makes your collections/cores management easier.

If you have other confusions and/or comments on the article, feel free to ping me on Twitter, on @arafalov.