Advanced Elasticsearch Configuration

22/07/2022 Rodrigo Alonso Aviles

Elasticsearch is a search and analysis engine that allows you to store documents (either structured or not) and index all the fields of these documents in near real time. It is distributable and easy to scale, focusing mainly on the business and scientific worlds. It is accessible through an extensive and elaborate API. With this tool, we can perform extremely fast searches that support our data discovery applications.

Its main use is the monitoring of distributed logs, forming part of the EFG stack (Elasticsearch, Fluentd and Grafana).

The main objects that can be defined and manipulated in Elasticsearch are indexes. These indexes are optimized collections of JSON documents, each of which is a collection of key-value fields that contain the data.

Within the indexes, we find the documents. These documents, as their name suggests, are JSON documents that are stored in Elasticsearch inside an index, with a specific id and type. Each can contain from 0 to n key-value fields.

To manage both the indexes and the documents of the Elasticsearch instance, we can use the Elastic Index Management image (API).

Elastic search configuration

The Elasticsearch configuration is provided, under the path «/usr/share/elasticsearch/config», where we will find the following files:

elasticsearch.keystore
elasticsearch.yml
jvm.options
jvm.options.d
log4j2.properties
role_mapping.yml
roles.yml
users
users_roles

Configuration in Openshift

Within Openshift, to be able to make changes to these files and have them saved for future changes and/or restarts of the pod, we need to create a PVC:

elastic-config.yaml 
 
kind: PersistentVolumeClaim 
apiVersion: v1 
metadata: 
  name: elastic-config 
  namespace: <namespace> 
spec: 
  accessModes: 
    - ReadWriteOnce 
  resources: 
    requests: 
      storage: 100Mi

And within the Elastic deployment, assign the following fields:

elasticsearch-deployment.yaml  

... 
spec: 
template: 
spec: 
      		volumes: 
        		- name: elasticdb-config 
        		  persistentVolumeClaim: 
         		  	claimName: elastic-config 
      		containers: 
        		- resources: 
volumeMounts: 
            			- name: elasticdb-config 
              			  mountPath: /usr/share/elasticsearch/config

Modifying the resources assigned to the JVM

Due to the specific needs of the project, it is possible that the load that Elasticsearch receives is greater than that expected by default by the program. In these cases, one of the possible solutions may be to increase the memory allocated to the Java virtual machine on which Elasticsearch is mounted.

To do this, we will access the «jvm.options» file mentioned above and modify the following parameters:

/usr/share/elasticsearch/config/jvm.options

-Xms3g

-Xmx3g

-Xms → Indica el tamaño mínimo de la JVM.
-Xmx → Indica el tamaño máximo de la JVM.

By default, both parameters have the value 1g.

-Xms and -Xmx value.

As a recommendation, the value of both variables should be the same and never greater than 50% of your RAM. For more information, check the official documentation.

Modification of the buckets

There are situations in which the requests made to Elastic, have to return an excessively large number of data containing aggregations (e.g.: From Grafana, we make a data query from the last two weeks, in which there has been a large number of index inserts in Elastic). In these cases, there is a variable within the elasticsearch.yml file, which is used to modify this parameter:

/usr/share/elasticsearch/config/elasticsearch.yml