Starting with the ELK stack

Posted by Binor on 02/01/2017

ELK stands for Elasticsearch, Logstash and Kibana. It provides an open source data analytics platform covering searching/analysing, transforming/enriching and visualising data.

The main components are:

  • Elasticsearch: distributed full text search engine based on Lucene.
  • Logstash: data processing pipeline, to collect, transform/parse/enrich and send it to a data store.
  • Kibana: user friendly interface to search, analyse and visualise your data.

In addition to the three above components, the ELK stack include also Beats a lightweight data shipper.

In this series of posts, we give an introduction to how you can start using the ELK stack to centrally store and manage your data/logs. Our showcase scenario here is how to index and store centrally logs generated by the Bro IDS. We will be deploying a two nodes Elasticsearch cluster. These nodes will be ingesting data coming from a one node processing pipeline running Logstash and a queueing system (Kafka/Redis). For log shipping from the source, we will be using Filebeat.

As mentioned above, Elasticsearch is a distributed full text search engine. It is also a document based/NoSQL data store. In Elasticsearch terminology, a document is a JSON object containing a list of fields. In relational database terminology, a document is similar to a row in a SQL table. An index is a collection of documents (a table in SQL terms). It is composed of shards and replicas. What you need to know about shards is that they are the basic construction block of the Lucene index. In practice you don’t need to worry about them. In contracts, replicas are more relevant from a user perspective. They are used to increase failover and search performance/speed. By default, every Elasticsearch shard/index will have one replica. This configuration can always be changed. Every Elasticsearch indexes includes a mapping defining some properties of the index and the fields of the documents it contains. It defines also how these fields are analysed/indexed by Lucene. In relational database terminology, a mapping is the schema definition of the index/table. By default, Elasticsearch will always try to automatically guess the data type of the document fields it sees. In Elasticsearch terminology, this is called dynamic mapping. Elasticsearch offers also a REST API to let you define your own mapping and interact with the data.

Enough theoretical talk. The installation of Elasticsearch is straight forward. Thanks to the folks at there is a binary package for all major Linux distributions. In our case, we are installing Elasticsearch on an Ubuntu server system. The following commands do the job. We need to run them on all the nodes (here 2 nodes) that will be part of our Elasticsearch cluster.

wget -qO - | sudo apt-key add -
echo "deb stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list
apt-get update
apt-get install openjdk-8-jdk elasticsearch

After running these commands you have now successfully installed Elasticsearch. The next step is to configure your Elasticsearch cluster nodes. The main configuration file is /etc/elasticsearch/elasticsearch.yml. The following are the main configuration line that need to be added/updated.

# Cluster name, should be the same on all the nodes binor-elk
# Node name, should be unique per node "node-01"
# The network interface the elasticsearch service will be listening on. We configure it to listen on the localhost and all the local network interfaces. [_local_, _site_]
# List of the ip of the nodes in this cluster.  [ "", "" ]

The above are the minimal/basic required configuration change to start your cluster. Depending on the hardware capabilities of the hosts running your Elasticsearch cluster, you can also change the JVM settings. You can increase the size of the heap memory available for the Elasticsearch process. You can also configure Elasticsearch not to use swapping. More details can be found at 1 and 2

After the above configurations, your Elasticsearch cluster is ready. You can start it by running systemctl start elasticsearch.service on the two nodes of the cluster. To verify that everything is working as expected, you can run the following HTTP request:

curl -XGET 'localhost:9200/?pretty'

You should then see a response similar to the following:

  "name" : "node-01",
  "cluster_name" : "binor-elk",
  "cluster_uuid" : "o6KIJ5o6TNq0sbO3QiDo4A",
  "version" : {
    "number" : "5.1.1",
    "build_hash" : "5395e21",
    "build_date" : "2016-12-06T21:36:15.409Z",
    "build_snapshot" : false,
    "lucene_version" : "6.3.0"
  "tagline" : "You Know, for Search"

Now our Elasticsearch cluster is ready to receive data. But before that, we will define an indexing/mapping template. In particular we want to make sure that special fields with IP and double int values are properly handled by the Elasticsearch indexer.

curl -XPUT -d '
  "order" : 0,
  "template" : "logstash-*",
  "settings" : {
    "index.number_of_shards" : 5,
    "index.number_of_replicas" : 1,
    "index.query.default_field" : "message"
  "mappings" : {
    "_default_" : {
      "properties" : {
        "src_ip" : {"type" : "ip"},
        "dst_ip" : {"type" : "ip"},
        "conn_duration" : {"type" : "double"}
  "aliases" : { }

In the above definition, we are telling Elasticsearch to always treat the fields src_ip and dst_ip as of type IP. This is a builtin Elasticsearch fields data type. It provides some special search capabilities like IP range or subnet filtering search. The mapping we define here is a basic one. We will update it later.

This conclude the setup of our Elasticsearch cluster. In the next blog posts we will talk about the data processing pipeline and how to configure Logstash to parse and ship logs to Elasticsearch.

Let's Get In Touch!

+222 45 29 00 29

+222 45 29 85 40