kok202
Elasticsearch 활용 (Analyzer, Aggregation)

2019. 4. 13. 16:08[공부] 독서/실무 예제 Elasticsearch 검색엔진 활용

엘라스틱서치 : 2014 기준

자동완성
- 전방일치 : 엘라 -> 엘라스틱
- 부분일치 : naver -> www.naver.com
- 후방일치 : 청바지 -> 남자청바지

자동완성을 구현하는 방법론
1. Prefix 쿼리
2. Suggester
3. ngram Analyzer 
4. edge ngram Analyzer

Analyzer
용도에 맞는 Tokenizer를 사용한다.

Tokenizer
일반 텍스트를 인덱스하기위한 의 작은 요소로 분할하는 방식
Tokenizer 종류 : https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html

Filter
Tokenizer에 들어가기전에 일반 텍스트를 가공하는 방식
filter의 종류
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenfilters.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-charfilters.html

Analyzer : 일반텍스트 -> Filter -> Tokenizer

 

 


Aggregation
Bucket aggregation : bucket = {key 필드, doc_count} 으로 같은 문서들을 집계한다.

Global  검색 쿼리 조건에 상관없이 젠체 문서를 대상으로 분석하고 싶을 때 
Filter  특정 조건을 필터로서 추가하여 일치하는 문서만 분석한다. 
Missing  집계 문서에서 지정된 필드가 정의 되지 않았거나 null 인문서의 통계 수치를 반환한다. 
Nseted  nested 타입을 가지는 문서들을 분석한다. 
Reverse nested  nested 문서를 포함한  parent 문서를 분석한다.
Tems  집계 문서에 추출된 색인어 들의 통계, 색인어들에 연결된 문서 갯수를 반환할 수 있다. 
Significant terms  검색 결과 문서에서 특이한 대상을 분석하는 용도 
Range  범위 조건에 맞는 문서들의 통계 정보 반환 
Date Range  날짜 범위 조건에 맞는 문서들의 통계 정보 반환 
Histogram  Interval 값으로 특정 숫자 필드를 나누고 이를 바탕으로 통계를 냅니다.
Date Historgram Interval 값으로 특정 날짜 필드를 나누고 이를 바탕으로 통계를 냅니다. 
Geo distance geo_point 필드 데이터를 바탕으로 통계를 냅니다. 


https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket.html

 

Bucket Aggregations | Elasticsearch Reference [7.0] | Elastic

The maximum number of buckets allowed in a single response is limited by a dynamic cluster setting named search.max_buckets. It defaults to 10,000, requests that try to return more than the limit will fail with an exception.

www.elastic.co

 

Metric aggregation 
문서들안에 존재하는 숫자 데이터의 값을 metric aggregation 유형에 따라 계산 결과를 집계 

min  min
max  max
avg  avg
sum sum
stats  min + max + avg + sum
extended stats  min + max + avg + sum + sum_of_squares + variance + std_deviation
value count  집계된 문서에서 추출된 수치값의 집계 결과 반환 
percentiles  추출한 수치의 백분율 통계 반환 
cardinality  지정된 필드의 구별되는 값의 갯수 반환 


https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics.html 

 

Metrics Aggregations | Elasticsearch Reference [7.0] | Elastic

The aggregations in this family compute metrics based on values extracted in one way or another from the documents that are being aggregated. The values are typically extracted from the fields of the document (using the field data), but can also be generat

www.elastic.co