0% found this document useful (0 votes)
137 views10 pages

Solr Elasticsearch

SOLR is an open source enterprise search platform built on Lucene. It provides features like indexing and querying over REST, as well as an administrative UI. SOLR uses request handlers, search components, and request writers to process queries. It allows adding documents and defining schemas. Elasticsearch is a distributed, RESTful search and analytics engine based on Lucene. It represents data as documents containing fields, and uses inverted indexes for fast searching. Elasticsearch has a dynamic schema and supports complex queries through its Query DSL.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views10 pages

Solr Elasticsearch

SOLR is an open source enterprise search platform built on Lucene. It provides features like indexing and querying over REST, as well as an administrative UI. SOLR uses request handlers, search components, and request writers to process queries. It allows adding documents and defining schemas. Elasticsearch is a distributed, RESTful search and analytics engine based on Lucene. It represents data as documents containing fields, and uses inverted indexes for fast searching. Elasticsearch has a dynamic schema and supports complex queries through its Query DSL.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

SOLR AND ELASTIC SEARCH

I. SOLR
SOLR is an open source enterprise search server or web application
build on top of LUCENE API LIBRARY. SOLR provides the features that
Lucene offers, plus many other tools which is very advantageous
instead of using direct lucene. SOLR exposes lucene Java APIs as REST-
FULL services. Putting documents in it is called indexing and it can be
done via XML, JSON, CSV or binary over HTTP. Users can query it via
HTTP GET and by doing so, users will receive XML, JSON, CSV or binary
results.

SOLR: KEY FEATURES

Solr offers Advanced Full-Text Search Capabilities


Optimized for High Volume Web Traffic
Standards Based Open Interfaces XML, JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Near Real-time indexing and Adaptable with XML configuration
Linearly scalable, auto index replication, auto, Extensible Plugin
Architecture

SOLR: ARCHITECTURE
Request handlers deal with incoming request for http or other
services, they interpret these requests so there will be a lot of
request handlers by default. Other than that, user can also
manage their request handlers to intercept their own requests.
The queries to request index that are inputted will go to different
kinds of request handlers in order for them to perform with users
application needs. Here are the functions request handlers:
o /admin is the request handler that will handle all the
administrative tasks
o /select intercepts different URL contexts
o /spell handles the part where spell-checking is needed
Search components start working after Request handlers finish
their tasks. There are many search components that can be
configured and used by the request handlers. Once the
component handles the incoming requests, it goes directly to the
distributed search where it fetches the records and give it
back.
Request Writers is completely different from request handlers
because request handlers are meant for reading the document
while request writers update the index. There are different forms
of request handlers, such as XML, JSON, and Binary.
Update Handlres handle the indexing process

SOLR: ADMIN UI

The admin UI provides the complete management of the solr server


which can be used to configure and manage some features such as
configuration files, admin cores, and indexes.
SOLR: SCHEMA HIERARCHY

Solar instance Web application can be deployed in multiple servers


so it can each run in one instance server.
Every server or instance can have multiple databases or indexes or
cores.
Each core or index can have documents and each document can have
multiple fields.

SOLR: CORE
Can also reffered to just a core, is a running instance of Lucene index
along with all the Solr configurations (SolrConfig.Xml, Schema.Xml,
etc). Schema.xml is used to define the structure of the core, while
SolrConfig.xml is used to define the architecture of the core or the
instance.
A single Solr application can contain 0 or more cores. Cores are run
largely in isolation but can communicate with each other if necessary
via the CoreContainer. Back then, Solr was initially only supported one
index, and the SolrCore class was a singleton for coordinating the low-
level functionality at the core of Solr.

SOLR: DOCUMENTS AND FIELDS


Solrs basic unit of information is a document, which is a set of data
that describes something. Documents are composed of fields, which
are more specific pieces of information. Fields can contain different
kinds of data, such as a name field, and so on. The filed type tells Solr
how to interpret the field and how it can be queried.
SOLR: INDEXING DATA
A solr index can accept data from many different sources, including
XML files, comma-separated value (CSV) files, data extracted from
tables in a database, and files in common file formats such as Microsoft
Word or PDFs. Here are the three most common ways of loading data
into a Solr index:
Uploading XML files by sending HTTP requests to the Solr
Using index handlers to import from databases
Using the solr cell framework
Writing a custom Java apllication to ingest data through solrs java
client

SOLR: ANALYSIS
There are three main concepts in analysis: analyzers, tokenizers, and
filters.
Analyzers are used both during, when a documen is indexed, and at
query time.
- The sama analysis process need not be used for both operations
- An analyzer examines the text of fields and generates a token stream
- Analuzers may be a single class or they may be composed of a series
of tokenizers and filter classes
Tokenizers break field data into lexical units or tokens and Filters
examine a stream of tokens and keep them, transform or discard them,
or create new ones.
SOLR: SEARCH PROCESS

SOLR: FEATURES

Faceting Pagination
Highlighting Grouping and clustering
Spell checking Spatial search
Query-re-ranking Components
Transforming Real time (get and update)
Suggestors LABS
More like this

SOLR: INSTALATION
Download the Apace Solr package

hduser@prast-virtual-machine:~$ cd ~
hduser@prast-virtual-machine:~$ wget
http://apache.mirror1.spango.com/lucene/solr/5.5.1/solr-5.5.1.tgz

Create a directory in /usr/local and rename it to solr

hduser@prast-virtual-machine:~$ cd /usr/local
hduser@prast-virtual-machine:/ur/local$ sudo mkdir solr

Move file solr-5.5.1.tgz to that directory and extract the files inside
it.

hduser@prast-virtual-machine:~$ sudo mv solr-5.5.1.tgz /usr/local/solr


hduser@prast-virtual-machine:~$ cd /usr/local/solr
hduser@prast-virtual-machine:/usr/local/solr$ tar xzf solr-5.5.1.tgz solr-
5.5.1/bin/install_solr_service.sh --strip-components=2

With bash command, we can now instal apache solr

hduser@prast-virtual-machine:/usr/local/solr$ sudo bash


./install_solr_service.sh solr-5.5.1.tgz

Check the status of the apache solr we just installed

hduser@prast-virtual-machine:/usr/local/solr$ sudo service solr status


If the installation was successful it will show the status of the
installation that was just done.

Found 1 Solr nodes:


Solr process 2750 running on port 8983
.....


By default, apache solr will run at port 8983. Solr gives us the
freedom to create connection within the instalation folder. To create
a connection for a workplace use this command:

hduser@prast-virtual-machine:/usr/local/solr$ sudo su - solr -c


"/opt/solr/bin/solr create -c gettingstarted -n data_driven_schema_configs"

II. ELASTICSEARCH

Elasticsearch is an open-source distributed restful search engine
that is based on Lucene. Elasticsearch is able to achieve fast search
responses because instead of searching directly for the text, it
searches an index instead. This type of index is called an inverted
index because it inverts a page-centric data structure (page -> words)
to a keyword-centric data structure (word -> pages).


HOW ELASTICSEARCH REPRESENTS DATA

Document is the unit of search and index. An index can contain of
one or more ducments and a documen can contain one or more fields.
In database terminology, a document corresponds to a table row, and a
field corresponds to a table column.


SCHEMA

Unlike solr, Elasticsearch is schema-free. However, it is necessary to


add mapping declarations if anything but the most basic fields and
operations is required.
The schema declares:
What fields are there
Which fiel should be used as the unique/primary key
Which fields are required
How to idex and search each field

In elasticsearch, an index may store documents of different


mapping types. Multiple mapping definitions for each mapping type
can beassociated. A mapping type is a way of separating the
documents in an index into logical groups.

To create a mapping, the Put Mappign API is needed, or the other


way is to add multiple mappings when an index is going to be created.

QUERY DSL

The Query DSL is elasticsearchs way of making Lucenes query


syntax accessible to users, allowng complex queres to be composed
using a JSON syntax.

Like lucene, there are basic queries such as term or prefix queries
and also compound queries like the bool query.

Below is the rough main structure of a query:


curl -X POST "http://localhost:9200/blog/_search?
pretty=true" -d
{"from": 0,
"size": 10,
"query" : QUERY_JSON,
FILTER_JSON,
FACET_JSON,
SORT_JSON
}

ELASTICSEARCH INSTALATION

Download elasticsearch from elastic.co/downloads/elasticsearch


and extract the archive file.

Once the archieve file has been extracted, elasticsearch is ready
to run. To strat it up in the foregroud:

cd elasticsearch-<version>
/bin/elasticsearch

Test it out by opening another terminal window and running the
following:

curl 'http://localhost:9200/?pretty'

If the OS is Windows, cURL can be downloaded from


http://curl.haxx.se/download.html

A response such as this will be shown:

{
"name" : "Tom Foster",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "2.1.0",
"build_hash" :
"72cd1f1a3eee09505e036106146dc1949dc5dc87",
"build_timestamp" : "2015-11-18T22:40:03Z",
"build_snapshot" : false,
"lucene_version" : "5.3.1"
},
"tagline" : "You Know, for Search"
}

This means that an elasticsearch node is up and running and it
is ready to be experimented with. A node is a running instnce of
elasticsearch. A cluster is a group of nodes with the same
cluster.name that are working together to share data and provide
failover and scale. The cluster.name can be changed in the
elasticsearch.yml configuration file that is loaded when a node is
started.

When elasticsearch is running in the foreground, it can be
stopped by pressing ctrl+C

SENSE

Sense is a Kibana app that provides an interactive console for


submitting requests to Elasticsearch directly from the browser.

SENSE INSTALLATION
Run the following command in the Kibana directory to download
and install the Sense app:

./bin/kibana plugin --install elastic/sense



Start Kibana

./bin/kibana

Open Sense on the web browser by going to
http://localhost:5601/app/sense

III. REFERENCES
https://www.elastic.co/guide/en/elasticsearch/guide/current/running-
elasticsearch.html

http://hiprast.com/teknologi/tutorial-instalasi-apache-solr/

http://www.elasticsearchtutorial.com/basic-elasticsearch-
concepts.html

http://www.edureka.co/

You might also like