Installing Solr on Ubuntu Linux
Following are instructions for installing the
Solr search server on
Ubuntu linux. There are several manual steps in setting up Solr, and most of the other documents I came across on
the internet are inadequate in some (or in many) ways so I enlisted the help of colleagues and documented the
steps start-to-finish here.
I found Solr not to my liking, encountering significant scaling issues while indexing
beyond 4-5 million small documents and so I've abandoned this application in favor of more standard/robust solutions
with a far larger community (e.g. mySQL) and more ubiquitous technology with long evolutionary histories (RDBMS) behind them.
The problem of indexing XML documents is best solved by avoidance. Digitally born data should exist in normalized and relational
states from the get-go.
These instructions have been tested with Hardy Heron 8.04, and will likely work with other recent versions of Ubuntu and Debian-based
distros with little or no modification.
Before You Start
Solr can be setup several ways -- these instructions lead up to a Solr environment deployed in Tomcat, with separate development
and production areas. Once you've done this a couple times (or carefully read this document a few times), you could set up three
environments, just one, or whatever layout suits your needs. There are hardcoded pathing dependencies of which you need to be aware.
You'll want to get the latest Java JDK from Sun
http://java.sun.com/javase/downloads/index.jsp and install it first. At the
time these instructions were written, I had installed Sun's jdk1.6.0_10. I'm unsure if it's required, but I also made
sure that "ant" was installed on my Ubuntu box (for ant, I simply used Ubuntu's handy package installer Synaptic).
I downloaded the Sun JDK to my user home directory and chmod +x'd the .bin exectuable. I sudo'd to root and executed the file. It made
me scroll through the license agreement and decompressed itself. I then mv'd it to /opt/jdk1.6.0_10.
Java needs at least two environment settings in order to be useful. You'll eventually need to set up CLASSPATH as well, but that's not
essential for the instructions in this document. I made the following .bashrc additions to both my ordinary user account
(/home/{username}/.bashrc), as well as for the root account (/root/.bashrc).
Go into each .bashrc file and add the following (which may be slightly different
if you chose a different location or have a different version of the JDK):
export PATH=/opt/jdk1.6.0_10/bin:$PATH
export JAVA_HOME=/opt/jdk1.6.0_10
Whenever you make changes to .bashrc you should issue a "source .bashrc" to instruct the shell to re-read the file (otherwise
you'd have to logout, and then log back in). You should now be able to type "which java" and see something like this:
/opt/jdk1.6.0_10/bin/java, depending on the version you downloaded.
Rather than lean on the Tomcat 5.5 version which was part of the Ubuntu repositories at the time of this writing, I downloaded
the latest Tomcat: http://tomcat.apache.org. I brought it down to my user directory,
decompressing it via gunzip and "tar xvf". It creates a Tomcat directory, populated with everything it needs.
As you use Tomcat over the lifespan of your project/development you may want a more succinct name than something like
"apache-tomcat-6.0.16" so I decided to rename (mv) this directory to simply "tomcat6". The instructions which follow
in this document will use that abbreviated "tomcat6" convention.
I then did this:
sudo su
mv tomcat6 /usr/local/
You can move it somewhere else -- I picked this location because a colleague who led me through most of these steps put it
in that location on his box and I decided to remain consistent with his setup. Maybe you want it in /usr/share/ or somewhere else.
Before going further, you should test Tomcat. At this stage, I'm still sudo'd as root.
cd /usr/local/tomcat6/bin
./startup.sh
You should see a message like this:
Using CATALINA_BASE: /usr/local/tomcat6
Using CATALINA_HOME: /usr/local/tomcat6
Using CATALINA_TMPDIR: /usr/local/tomcat6/temp
Using JRE_HOME: /opt/jdk1.6.0_10
(Note that JRE_HOME is the location of the Sun JDK installed in an earlier step. You really need this -- if Tomcat is aimed at a JRE
that you don't want, or can't find it, you can't go any further.) Eventually you'll probably want to create a Tomcat specific user,
and give it appropriate/minimal rights, instead of using root.
Go to your browser and type this:
http://localhost:8080/
Go to Tomcat servlet examples and click a couple of them, click a couple jsp examples also.
They should execute without complaining. At this stage we've installed the latest JDK, the latest Tomcat,
and things are talking to one another. If you're getting something wildly different, you can't go any further here.
In order to complete this document, it should be "all systems go" at this point.
Before going further, you should shut Tomcat back down:
cd /usr/local/tomcat6/bin
./shutdown.sh
I downloaded the latest Solr here:
http://www.apache.org/dyn/closer.cgi/lucene/solr/. As with Tomcat,
I issued gunzip and "tar xvf" to decompress it to my home user directory. It creates a directory called "apache-solr-1.2.0".
We need to manually create some directories within /usr/local/tomcat6. This setup will yield us two Solr locations within your Tomcat
instance: one for development, another for production. There are other ways to set up Solr, but if this is your first attempt you may
want to follow this convention. It's unclear why /Catalina and /Catalina/localhost aren't created automatically with a Tomcat install.
Probably just to keep our salaries up. The /data/solr directory, as you can see, will have an identical structure below it for dev and
prod. Each of those directories additionally has corresponding /conf and /data directories below it.
Make these directories:
/usr/local/tomcat6/conf/Catalina
/usr/local/tomcat6/conf/Catalina/localhost
/usr/local/tomcat6/data
/usr/local/tomcat6/data/solr
/usr/local/tomcat6/data/solr/dev
/usr/local/tomcat6/data/solr/dev/conf
/usr/local/tomcat6/data/solr/dev/data
/usr/local/tomcat6/data/solr/prod
/usr/local/tomcat6/data/solr/prod/conf
/usr/local/tomcat6/data/solr/prod/data
Now we should copy the solr "war" file into position for deployment. Go to the directory where you decompressed solr in an earlier step,
and go into the dist subdirectory. For instance: apache-solr-1.2.0/dist.
cp apache-solr-1.2.0.war /usr/local/tomcat6/data/solr
Now, in /usr/local/tomcat6/conf/Catalina/localhost we need to create and save two files which will be read the next time you start
Tomcat, and (hopefully) properly deploy Solr. Use a text editor of your choice and create these two files in the /Catalina/localhost
subdirectory.
cd /usr/local/tomcat6/conf/Catalina/localhost
solrdev.xml
<Context docBase="/usr/local/tomcat6/data/solr/apache-solr-1.2.0.war" debug="0" crossContext="true">
<Environment name="solr/home" type="java.lang.String" value="/usr/local/tomcat6/data/solr/dev" override="true" />
</Context>
solrprod.xml
<Context docBase="/usr/local/tomcat6/data/solr/apache-solr-1.2.0.war" debug="0" crossContext="true">
<Environment name="solr/home" type="java.lang.String" value="/usr/local/tomcat6/data/solr/prod" override="true" />
</Context>
There are some sample configuration files which come with the Solr distribution you downloaded. Let's copy those into their proper
position. Go to the working directory where you downloaded solr, and into the /example/solr/conf subdirectory:
/apache-solr-1.2.0/example/solr/conf. You should see something like this:
admin-extra.html schema.xml solrconfig.xml synonyms.txt
protwords.txt scripts.conf stopwords.txt xslt
Copy everything here to your development solr configuration directory:
cp -R * /usr/local/tomcat6/data/solr/dev/conf
Do the same for your production location also:
cp -R * /usr/local/tomcat6/data/solr/prod/conf
Time to test. Everything should now be in place. Sacrifice a chicken and restart Tomcat:
cd /usr/local/tomcat6/bin
./startup.sh
Go to your browser and type this:
http://localhost:8080/solrprod
and also:
http://localhost:8080/solrdev
This this point you should see a "Welcome to Solr!" message with a "Solr Admin" link. If you can click the click and see an example
search interface you've probably successfully installed Solr.