Apache Solr

Apache Solr

Apache Solr

Solr is most widely used search engine solution with features like full-text search capabilities, proven for handling high traffic, highly scalable and fault tolerant etc.

After getting to know the features next question that strikes your mind, where do I implement this and why do I need this. So there can be lots of scenarios where this suffice my requirement.

  1. Generally, in the starting, all the data is stored in the database like MySql or MSSQL. The application queries all the data from the database and presents it to you. Everyone is happy as the functionality is working. But when the site starts getting traffic and the number of such queries increases the database is under immense load and become a bottleneck for you. You opt for databases replication with read and write segregation.  Still, you don’t get that performance.  Then you want to opt for any other solution which is fast, robust and customisable as per your need. This is the point where you opt for Apache Solr.

Solr terminology

When you start reading about Solr what is most difficult is the terminology. Words like core and collection are thrown upon you and you are like okay….

So let’s start with the terminology:

  • Core: Consider this as database but with all relevant configuration file about that particular core only
  • Collection: Complete set of index which might be distributed to n no of node (in case of SolrCloud)
  • Schema: The details about what fields will be index and how this will stored i.e data type.
  • Fields: Information on which field will be indexed and with detail about the data type.
  • Document: Set of data that describes something or the actual indexed value which u can query.
  • Indexing: Updating the recordset.

The list is so long but for starters, this is quite enough.

Installation

Solr can be installed and run in 3 modes namely

Standalone:

  • Single node
  • No HA
  • No failover
  • Easiest to install

Replication

  • 1+n node (1 being the master)
  • HA
  • No automatic failover
  • Needs little planning and takes time

SolrCloud

  • n+2 nodes (recommended)
  • HA
  • Automatic failover
  • Needs zookeeper to sync config to all the nodes
  • Needs more time for installation and configuration.

For part 1 we will only install and configure Solr as standalone mode.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *