配列内の文字列によるElasticsearch用語の集計

Question

バケットを個別のトークンではなく用語全体で分割するElasticsearch用語集計を作成するにはどうすればよいですか？たとえば、州ごとに集計したいのですが、次の例では、new、york、jersey、およびcaliforniaを個々のバケットとして返します。ニューヨーク、ニュージャージー、カリフォルニアはバケットとして期待どおりに返されません。

curl -XPOST "http://localhost:9200/my_index/_search" -d' { "aggs" : { "states" : { "terms" : { "field" : "states", "size": 10 } } } }'

私のユースケースはここで説明されているものと似ています https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html 違いが1つだけあります：都市私の場合、フィールドは配列です。

オブジェクトの例：

{ "states": ["New York", "New Jersey", "California"] }

提案されたソリューション（フィールドをnot_analyzedとしてマッピングする）は配列に対して機能しないようです。

私のマッピング：

{ "properties": { "states": { "type":"object", "fields": { "raw": { "type":"object", "index":"not_analyzed" } } } } }

「オブジェクト」を「文字列」に置き換えようとしましたが、これも機能しません。

Sloan Ahrens · Answer

あなたが見逃しているのは"states.raw"を集計で使用します（アナライザーが指定されていないため、"states"フィールドは標準アナライザーで分析されます。サブフィールド"raw"は"not_analyzed"）。あなたのマッピングも見るのに耐えるかもしれませんが。 ES 2.0に対してマッピングを試行したところ、いくつかのエラーが発生しましたが、これは機能しました。

PUT /test_index { "mappings": { "doc": { "properties": { "states": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } } } } } }

次に、いくつかのドキュメントを追加しました：

POST /test_index/doc/_bulk {"index":{"_id":1}} {"states":["New York","New Jersey","California"]} {"index":{"_id":2}} {"states":["New York","North Carolina","North Dakota"]}

そして、このクエリはあなたが望むことをするようです：

POST /test_index/_search { "size": 0, "aggs" : { "states" : { "terms" : { "field" : "states.raw", "size": 10 } } } }

戻る：

{ "took": 1, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 2, "max_score": 0, "hits": [] }, "aggregations": { "states": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "New York", "doc_count": 2 }, { "key": "California", "doc_count": 1 }, { "key": "New Jersey", "doc_count": 1 }, { "key": "North Carolina", "doc_count": 1 }, { "key": "North Dakota", "doc_count": 1 } ] } } }

テストに使用したコードは次のとおりです。

http://sense.qbox.io/Gist/31851c3cfee8c1896eb4b53bc1ddd39ae87b173e