# elasticsearch overview

site officiel www.elastic.co

# ELK overview

Elasticsearch Store, Search, and Analyze

Logstash: Collect, Enrich, and Transport

Kibana: Explore, Visualize, and Share

# exemple

Mise en place d'une plateforme centralisée de gestion des logs avec elastic (D. Pilato, E. Demey)

Le code est sur Gillespie59/devoxx-universite-elastic sur GitHub.

Stack front angular + back node.js, le tout dockérisé. Comment avoir des logs et des stats avec une stack ELK.

# install

# elastic

Installing Elasticsearch - all packages formats

Installing Elasticsearch - with apt

# Concepts

What is an index in Elasticsearch

Comparing with relationnal database (5- only !) :

  • MySQL => Databases => Tables => Columns/Rows
  • ElasticSearch => Indices => Types => Documents with Properties

An ElasticSearch cluster can contain multiple Indices (databases), which in turn contain multiple Types (tables). These types hold multiple Documents (rows), and each document has Properties (columns).

Searching and querying takes the format of: http://localhost:9200/[index]/[type]/[operation]

!!! Breaking change in Elasticsearch 6+ !!!

The type becomes deprecated, in v7 it will be removed.

Remove support for types? - the GitHub issue in elastic/elasticsearch repo

Here is why : Removal of mapping types - www.elastic.co/guide

In an Elasticsearch index, fields that have the same name in different mapping types are backed by the same Lucene field internally.

In other words, using the example above, the user_name field in the user type is stored in exactly the same field as the user_name field in the tweet type, and both user_name fields must have the same mapping (definition) in both types.

This can lead to frustration when, for example, you want deleted to be a date field in one type and a boolean field in another type in the same index.

# basic concepts

Getting Started @latest

Basic Concepts - www.elastic.co - v5.6

Near Real Time, Cluster, Node, Index, Type, Document, Shards & Replicas

Basic Concepts - www.elastic.co - v6.2

# exploring the cluster

# cluster health

GET /_cat/health?v

epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1475247709 17:01:49  elasticsearch green           1         1      0   0    0    0        0             0                  -                100.0%
1
2

cf cat APIs

GET /_cat/nodes?v

ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1           10           5   5    4.46                        mdi      *      PB2SGZY
1
2

# list all indices

GET /_cat/indices?v

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green  open   index_boubou_1       OGxhmflXRKy4z_tHvlqf3w   5   0    5436288            0      6.9gb          6.9gb
green  open   index_boubou_2       FL3x1kDhT1y1DA4bXy4kLQ   5   0    5436132       162456      7.1gb          7.1gb
1
2
3

# create an index

PUT /customer?pretty
GET /_cat/indices?v
1
2
health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   customer 95SQ4TSUT7mWBT7VNHH67A   5   1          0            0       260b           260b
1
2

# index a document

without specifying the id, use a POST request

in v5.6

POST /customer/external?pretty
{
  "name": "Jane Doe"
}
1
2
3
4

in v6.2

POST /customer/_doc?pretty
{
  "name": "Jane Doe"
}
1
2
3
4

to specify the id use a PUT request

in v5.6

PUT /customer/external/1?pretty
{
  "name": "John Doe"
}
1
2
3
4

response

{
  "_index" : "customer",
  "_type" : "external",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}
1
2
3
4
5
6
7
8
9
10
11
12
13

in v6.2

PUT /customer/_doc/1?pretty
{
  "name": "John Doe"
}
1
2
3
4

response

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14

# query a document

in v5.6

GET /customer/external/1?pretty

{
  "_index" : "customer",
  "_type" : "external",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : { "name": "John Doe" }
}
1
2
3
4
5
6
7
8

in v6.2

GET /customer/_doc/1?pretty

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : { "name": "John Doe" }
}
1
2
3
4
5
6
7
8

# delete an index

in v5.6 and in v6.2

DELETE /customer?pretty
GET /_cat/indices?v
1
2
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
1

# summary

we have executed in v5.6

PUT /customer
PUT /customer/external/1
{
  "name": "John Doe"
}
GET /customer/external/1
DELETE /customer
1
2
3
4
5
6
7

and in v6.2

PUT /customer
PUT /customer/_doc/1
{
  "name": "John Doe"
}
GET /customer/_doc/1
DELETE /customer
1
2
3
4
5
6
7

pattern to access data in elastic : <REST Verb> /<Index>/<Type>/<ID>

In v6.2, _doc is the <Type> value by default.

# JavaScript

elastic/elasticsearch-js - github.com

Usable from Node or the browser

# articles

Elasticsearch & Node.js [Getting Started] - Siddhartha Chowdhury - 20170123

Build a Search Engine with Node.js and Elasticsearch - Behrooz Kamali - 20160927

Self hosted search engine

Ubuntu 16.04 LTS server + Docker + Elasticsearch + Calaca for UI + nutch for website crawler