Introduction to Elasticsearch usage

When using Elasticsearch (Amazon Opensearch) you need to have a basic understanding how it stores and processes data.


Overview

An ElasticSearch cluster is basically a key-value database consisting of one or more nodes. These nodes provide the hardware to store, manage, and search the data that resides in that cluster.
These “AWS OpenSearch instances” in the linked list can be used as nodes:
https://aws.amazon.com/de/opensearch-service/pricing/

Resources and Heap

Important distinguishing features are CPU, memory and the attached volume space for persistent storage of data and cluster snapshots.

The specified memory describes the instance RAM, 50% of this instance RAM is available to the ES cluster as a heap, in which the index shards are managed and cached.

Indices and shards

The data that is stored in the cluster is always assigned to an index (primary index) that manages this data across the cluster. An index always consists of one or more shards, which are distributed evenly across the individual nodes in the cluster, in order to be able to search them more quickly for search queries.

The amount of automatically created shards per index is defined in the index configuration. For example, it makes no sense to use more than one shard per index in a SingleNodeCluster, since they cannot be searched by more than one node at the same time (preventing risk of oversharding).

Replica indices and shards

In order to increase the availability of the data, the number of copies of the indices in the cluster (replica index) can be defined in the cluster configuration. Every replica index is automatically stored on a different node than his primary index.

For example, it makes no sense to create replica shards in a single-node cluster, since they cannot be assigned to any other node than the primary node (unassigned shards).

The replica shards belonging to the replica index are distributed evenly across all nodes and taking up space in the cluster heap (increasing risk of oversharding).

Shards per GB Heap

Since the management overhead in the cluster increases with the number of its shards, the recommendation is not to use more than 20 ShardsPerGBHeap. However, a single shard can contain up to 30GB of data (pay attention to attached volume size).

ElasticSearch used as a search engine for customer applications

When planning and configuring an ElasticSearch cluster, the following parameters are important:

  • Amount of planned indices (search categories).

  • Amount of nodes over which the shards can be distributed (search performance).

    • the number of nodes in the cluster must always be odd to avoid management problems like split brain.

  • Amount of replica indices (reliability).

    • The needed amount of cluster heap results from the maximum number of indices and their desired shards.

  • The instance size to choose results from the planned cluster heap and the desired number of nodes in the cluster.

  • For clusters with many small shards and few search operations, we can increase our alert threshold “ShardsPerGBHeap” at your own risk. By doing this, you should keep an eye on the cluster metric “Java Memory Pressure”.

See also OpenSearch documentation: https://opensearch.org/docs/latest/opensearch/index/

Elasticsearch used in Advanced Logging

An ElasticSearch cluster of an Advanced Logging setup (Logging-ES) behaves like an ElasticSearch cluster for customer applications, with the difference that the data stored and indexed is log data which is shipped by the corresponding customer applications into the cluster.

The Logging-ES serves as storage for the project logs, for which a new index is created by the system every day. Logs from the previous day are processed in the Logging-ES (i.e. removal of parts of any IP address for data protection compliance, labeled with a timestamp and compressed) and retained for a preconfigured time.

Filebeat is responsible for transferring and converting the logs from the source application to the logging-ES. For example, it defines fields in the logs, according to which the logs can then be filtered in a user-friendly manner in Kibana of the logging-ES. Only filebeat creates the indices in the logging ES.


Related tutorials

Related components

root360 Knowledge Base - This portal is hosted by Atlassian (atlassian.com | Privacy Policy)