What is ElasticSearch? Pros, Cons and Features List

Sep 18, 2022
100 views

In this tutorial, we are going to discuss what is ElasticSearch and its features. This article will help you to enhance your knowledge about ElasticSearch.

What is ElasticSearch?

Elasticsearch is a RESTful and open source search engine built on Apache Lucene under the Apache license. Based on Java, it is used to search and index documents files in various formats. Comparatively, with other search engines, it offers many notable features such as scalable and real-time search, multi-tenancy, JSON format indexing and many more. It helps you to examine and maintain real-time information at a great volume.

Features of ElasticSearch

General Key features of Elastic search are

  • It can easily scalable to handle petabytes of data both in a structured and unstructured format.
  • It can be used as a replacement MongoDB and RavenDB.
  • Improved search performance.
  • Can handle all types of data, including textual, numerical, geospatial, structured, and unstructured.
  • Developer friendly License Apache License 2.0 (partially; open source)
  • Node: It is a single instance of Elasticsearch server
  • Cluster: Collection of one or more nodes.
  • Index: Collection different documents and their attributes.
  • Document: Collection of attributes/ fields in JSON format.
  • Shard: Component of an Index that contains properties of the document.
  • Replicas: Used to create copies of indexes and shards for data recovery in case of failure

Pros and Cons of Elasticsearch

Pros of Elasticsearch

  • Full-text search: The amazing feature of Elasticsearch is it offers the most effective full-text search property. Being built on top of Lucene, it performs searches based on language and returns those documents that match the search condition. TF/IDF algorithm is used for the calculation to find the relevancy of the result for the given query.
  • Parallel processing: Although Elasticsearch is capable to process data on single data, still it prefers to perform data on several nodes. It generates high productivity with parallel processing by allocating primary and replica shards across all available nodes. While processing the query, it retrieves the information from all those shards that have the required data to execute the query. This is how it processes the many nodes at one time and effectively utilizes the memory.
  • Built-in parallelization: The best part of Elasticsearch's parallel processing is that it is all built-in. This means the user doesn't need to lift his/her finger to structure the queries' paths among shards. The default systems in Elasticsearch make it easy-peasy to get a start.
  • Architecture: Unlike relation databases, Elasticsearch has a more refined and robust architecture. Some of the key parts are Cluster, Index, Document, shard, Node, Replica shard and many more. The important part -Shard is defined as a partition of data that runs on a node and replica shared is the copy of primary shard that runs on a node different from a primary shard.
  • Nodes handling: Working at a large scale creates the problem of availability. So, to ensure high availability, Elasticsearch manages the nodes and shards by master approach. The master nodes manage all other nodes and record the changes such as the addition and deletion of a node. When there is any change, the master node re-shards the cluster and organize the shards on nodes again.
  • Self-organizing behavior: You may wonder but Elasticsearch indeed outshines for its self-organizing approach to its infrastructure. This means there never exists a single point failure I.e., data control processing is not performed by a master node. No single node can process the data alone, which means system failure doesn't depend on any one node. If there is a failure of a master node or any other node, then other nodes automatically replace the defected node. This is how it works at a great level or scale.

Cons of Elasticsearch:

  • Language constraint: To handle the requests and responses, Elasticsearch doesn't have multi-language support. It supports only a JSON format whereas Apache Solr supports CSV, XML as well JSON format.
  • Well-organized: To run the queries correctly, you need to take care of a hierarchy of indexes, IDs, and types. Besides this, you also need to ensure the status of all nodes must be 'green' and not 'yellow'. When there is less data, then you can organize the cluster manually. But on a large scale, you need to organize data and infrastructure effectively.
  • SSD's requirement: Elasticsearch needs a group of servers having 64GB of RAM to work efficiently. Otherwise, if we use too many small servers, it creates overhead or if we use a few powerful servers, there is a chance of failover. Moreover, queries run faster if data stored in SSDs rather than rotating disks. However, SSDs are more expensive, this creates the infrastructure overpriced.

Difference Between Elasticsearch & Apache Solr

Elasticsearch Solr
Although it was started in 2001 by its founder Shay Bannon with the name Compass, Elasticsearch was officially created in 2010. Solr has a longer history as it was created in 2004 by Yonik Seely at CNET Networks
It supports only a JSON format It supports XML, CSV, JSON format.
Elasticsearch can be called schema-less. In Solr, you need the managed-schema file to define how your index structure.
Elasticsearch uses its discovery implementation called Zen. Solr uses Apache ZooKeeper for discovery and leader election.
Elasticsearch caches are per segment, which means you only need to refresh a small portion of the cached data, if a single segment changed Solr has global caches, means invalidated on change of each segment.
Shard placement is Dynamic, shards can be moved on demand depending on the cluster state. Static in nature, requires manual work to migrate shards, but later versions from Solr7 allows for some dynamic actions.
Share this post