Elasticsearchはフィールドごとにドキュメントグループをフィルターします

Question

私はいくつかの文書を持っています：

{"name": "John", "district": 1}, {"name": "Mary", "district": 2}, {"name": "Nick", "district": 1}, {"name": "Bob", "district": 3}, {"name": "Kenny", "district": 1}

地区ごとに異なるドキュメントをフィルタリング/選択するにはどうすればよいですか？

{"name": "John", "district": 1}, {"name": "Mary", "district": 2}, {"name": "Bob", "district": 3}

SQLでは、GROUP BYを使用できます。用語の集計を試みましたが、個別のカウントのみを返しました。

"aggs": { "distinct": { "terms": { "field": "district", "size": 0 } } }

助けてくれてありがとう！ :-)

ThomasC · Accepted Answer

ElasticSearchのバージョンが1.3以上の場合、タイプ top_hits のサブ集計を使用できます。これにより、クエリスコアで並べ替えられた上位3つの一致するドキュメントが（デフォルトで）表示されます（ここでは、 match_allクエリ）。

sizeパラメーターを3以上に設定できます。

次のデータセットとクエリ：

POST /test/districts/ {"name": "John", "district": 1} POST /test/districts/ {"name": "Mary", "district": 2} POST /test/districts/ {"name": "Nick", "district": 1} POST /test/districts/ {"name": "Bob", "district": 3} POST test/districts/_search { "size": 0, "aggs":{ "by_district":{ "terms": { "field": "district", "size": 0 }, "aggs": { "tops": { "top_hits": { "size": 10 } } } } } }

あなたが望む方法でドキュメントを出力します：

{ "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "by_district": { "buckets": [ { "key": 1, "key_as_string": "1", "doc_count": 2, "tops": { "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "test", "_type": "districts", "_id": "XYHu4I-JQcOfLm3iWjTiOg", "_score": 1, "_source": { "name": "John", "district": 1 } }, { "_index": "test", "_type": "districts", "_id": "5dul2XMTRC2IpV_tKRRltA", "_score": 1, "_source": { "name": "Nick", "district": 1 } } ] } } }, { "key": 2, "key_as_string": "2", "doc_count": 1, "tops": { "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "test", "_type": "districts", "_id": "I-9Gd4OYSRuexhP1dCdQ-g", "_score": 1, "_source": { "name": "Mary", "district": 2 } } ] } } }, { "key": 3, "key_as_string": "3", "doc_count": 1, "tops": { "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "test", "_type": "districts", "_id": "bti2y-OUT3q2mBNhhI3xeA", "_score": 1, "_source": { "name": "Bob", "district": 3 } } ] } } } ] } } }

Akash Yadav · Answer

弾性検索では、一意の値ごとの値またはグループに関する個別のドキュメントは提供されません。ただし、この問題を回避するには、Javaクライアントを使用している場合、または適切な言語に変換できる場合にこれを行うことができます

SearchResponse response = client.prepareSearch().execute().actionGet(); SearchHits hits = response.getHits(); Iterator<SearchHit> iterator = hits.iterator(); Map<String, SearchHit> distinctObjects = new HashMap<String,SearchHit>(); while (iterator.hasNext()) { SearchHit searchHit = (SearchHit) iterator.next(); Map<String, Object> source = searchHit.getSource(); if(source.get("district") != null){ distinctObjects.put(source.get("district").toString(),source); } }