unordered_mapは実際にはマップよりも高速ですか？

Question

確かに、unordered_mapの検索パフォーマンスは平均して一定であり、マップの検索パフォーマンスはO（logN）です。

しかしもちろん、unordered_mapでオブジェクトを見つけるためには、次のことを行う必要があります。

検索するキーをハッシュします。
equality_compareと同じバケット内のすべてのキーを比較します。

一方、マップでは、求めるキーをlog2（N）キーと比較するless_thanが必要です。ここで、Nはマップ内のアイテムの数です。

ハッシュ関数がオーバーヘッドを追加し、equality_compareがless_than比較よりも安価ではないことを考えると、実際のパフォーマンスの違いはどうなるのだろうと思いました。

私が自分で答えることができる質問でコミュニティを悩ませるのではなく、テストを書きました。

他の誰かがこれを興味深く、または便利だと思った場合に備えて、以下の結果を共有しました。

もちろん、誰かがより多くの情報を追加できて喜んでであるなら、より多くの答えが招待されます。

Richard Hodges · Accepted Answer

失敗した検索の数に関するパフォーマンスに関する質問に答えて、これをパラメーター化するためにテストをリファクタリングしました。

結果の例：

searches=1000000 set_size= 0 miss= 100% ordered= 4384 unordered= 12901 flat_map= 681 searches=1000000 set_size= 99 miss= 99.99% ordered= 89127 unordered= 42615 flat_map= 86091 searches=1000000 set_size= 172 miss= 99.98% ordered= 101283 unordered= 53468 flat_map= 96008 searches=1000000 set_size= 303 miss= 99.97% ordered= 112747 unordered= 53211 flat_map= 107343 searches=1000000 set_size= 396 miss= 99.96% ordered= 124179 unordered= 59655 flat_map= 112687 searches=1000000 set_size= 523 miss= 99.95% ordered= 132180 unordered= 51133 flat_map= 121669 searches=1000000 set_size= 599 miss= 99.94% ordered= 135850 unordered= 55078 flat_map= 121072 searches=1000000 set_size= 695 miss= 99.93% ordered= 140204 unordered= 60087 flat_map= 124961 searches=1000000 set_size= 795 miss= 99.92% ordered= 146071 unordered= 64790 flat_map= 127873 searches=1000000 set_size= 916 miss= 99.91% ordered= 154461 unordered= 50944 flat_map= 133194 searches=1000000 set_size= 988 miss= 99.9% ordered= 156327 unordered= 54094 flat_map= 134288

キー：

searches = number of searches performed against each map set_size = how big each map is (and therefore how many of the searches will result in a hit) miss = the probability of generating a missed search. Used for generating searches and set_size. ordered = the time spent searching the ordered map unordered = the time spent searching the unordered_map flat_map = the time spent searching the flat map note: time is measured in std::system_clock::duration ticks.

TL; DR

結果：unordered_mapは、マップにデータがあるとすぐにその優位性を示します。順序付けされたマップよりもパフォーマンスが悪いのは、マップが空のときだけです。

新しいコードは次のとおりです。

#include <iostream> #include <iomanip> #include <random> #include <algorithm> #include <string> #include <vector> #include <map> #include <unordered_map> #include <unordered_set> #include <chrono> #include <Tuple> #include <future> #include <stdexcept> #include <sstream> using namespace std; // this sets the length of the string we will be using as a key. // modify this to test whether key complexity changes the performance ratios // of the various maps static const size_t key_length = 20; // the number of keys we will generate (the size of the test) const size_t nkeys = 1000000; // use a virtual method to prevent the optimiser from detecting that // our sink function actually does nothing. otherwise it might skew the test struct string_user { virtual void sink(const std::string&) = 0; virtual ~string_user() = default; }; struct real_string_user : string_user { virtual void sink(const std::string&) override { } }; struct real_string_user_print : string_user { virtual void sink(const std::string& s) override { cout << s << endl; } }; // generate a sink from a string - this is a runtime operation and therefore // prevents the optimiser from realising that the sink does nothing std::unique_ptr<string_user> make_sink(const std::string& name) { if (name == "print") { return make_unique<real_string_user_print>(); } if (name == "noprint") { return make_unique<real_string_user>(); } throw logic_error(name); } // generate a random key, given a random engine and a distribution auto gen_string = [](auto& engine, auto& dist) { std::string result(key_length, ' '); generate(begin(result), end(result), [&] { return dist(engine); }); return result; }; // comparison predicate for our flat map. struct pair_less { bool operator()(const pair<string, string>& l, const string& r) const { return l.first < r; } bool operator()(const string& l, const pair<string, string>& r) const { return l < r.first; } }; template<class F> auto time_test(F&& f, const vector<string> keys) { auto start_time = chrono::system_clock::now(); for (auto const& key : keys) { f(key); } auto stop_time = chrono::system_clock::now(); auto diff = stop_time - start_time; return diff; } struct report_key { size_t nkeys; int miss_chance; }; std::ostream& operator<<(std::ostream& os, const report_key& key) { return os << "miss=" << setw(2) << key.miss_chance << "%"; } void run_test(string_user& sink, size_t nkeys, double miss_prob) { // the types of map we will test unordered_map<string, string> unordered; map<string, string> ordered; vector<pair<string, string>> flat_map; // a vector of all keys, which we can shuffle in order to randomise // access order of all our maps consistently vector<string> keys; unordered_set<string> keys_record; // generate keys auto eng = std::default_random_engine(std::random_device()()); auto alpha_dist = std::uniform_int_distribution<char>('A', 'Z'); auto prob_dist = std::uniform_real_distribution<double>(0, 1.0 - std::numeric_limits<double>::epsilon()); auto generate_new_key = [&] { while(true) { // generate a key auto key = gen_string(eng, alpha_dist); // try to store it in the unordered map // if it already exists, force a regeneration // otherwise also store it in the ordered map and the flat map if(keys_record.insert(key).second) { return key; } } }; for (size_t i = 0 ; i < nkeys ; ++i) { bool inserted = false; auto value = to_string(i); auto key = generate_new_key(); if (prob_dist(eng) >= miss_prob) { unordered.emplace(key, value); flat_map.emplace_back(key, value); ordered.emplace(key, std::move(value)); } // record the key for later use keys.emplace_back(std::move(key)); } // turn our vector 'flat map' into an actual flat map by sorting it by pair.first. This is the key. sort(begin(flat_map), end(flat_map), [](const auto& l, const auto& r) { return l.first < r.first; }); // shuffle the keys to randomise access order shuffle(begin(keys), end(keys), eng); auto unordered_lookup = [&](auto& key) { auto i = unordered.find(key); if (i != end(unordered)) { sink.sink(i->second); } }; auto ordered_lookup = [&](auto& key) { auto i = ordered.find(key); if (i != end(ordered)) { sink.sink(i->second); } }; auto flat_map_lookup = [&](auto& key) { auto i = lower_bound(begin(flat_map), end(flat_map), key, pair_less()); if (i != end(flat_map) && i->first == key) { sink.sink(i->second); } }; // spawn a thread to time access to the unordered map auto unordered_future = async(launch::async, [&]() { return time_test(unordered_lookup, keys); }); // spawn a thread to time access to the ordered map auto ordered_future = async(launch::async, [&] { return time_test(ordered_lookup, keys); }); // spawn a thread to time access to the flat map auto flat_future = async(launch::async, [&] { return time_test(flat_map_lookup, keys); }); // synchronise all the threads and get the timings auto ordered_time = ordered_future.get(); auto unordered_time = unordered_future.get(); auto flat_time = flat_future.get(); cout << "searches=" << setw(7) << nkeys; cout << " set_size=" << setw(7) << unordered.size(); cout << " miss=" << setw(7) << setprecision(6) << miss_prob * 100.0 << "%"; cout << " ordered=" << setw(7) << ordered_time.count(); cout << " unordered=" << setw(7) << unordered_time.count(); cout << " flat_map=" << setw(7) << flat_time.count() << endl; } int main() { // generate the sink, preventing the optimiser from realising what it // does. stringstream ss; ss << "noprint"; string arg; ss >> arg; auto puser = make_sink(arg); for (double chance = 1.0 ; chance >= 0.0 ; chance -= 0.0001) { run_test(*puser, 1000000, chance); } return 0; }

Richard Hodges · Answer

Apple clang -O3でコンパイルしたこの次のテストでは、テストが公平であることを確認するために、次のような手順を実行しました。

オプティマイザが検索全体をインライン化しないように、vtableを介した各検索の結果でシンク関数を呼び出します。
同じデータを含む3種類のマップで、同じ順序で並行してテストを実行します。これは、1つのテストが「先に進む」ことを開始した場合、検索セットのキャッシュミス領域に入り始めます（コードを参照）。これは、「ホット」キャッシュに遭遇するという不当な利点を1つのテストが得ないことを意味します。
鍵のサイズ（したがって、複雑さ）をパラメーター化する
マップサイズをパラメータ化
3つの異なる種類のマップ（同じデータを含む）をテストしました-unordered_map、マップ、およびキーと値のペアのソートされたベクトル。
アセンブラの出力をチェックして、デッドコード分析のためにオプティマイザがロジックのチャンク全体を最適化できなかったことを確認しました。

これがコードです：

#include <iostream> #include <random> #include <algorithm> #include <string> #include <vector> #include <map> #include <unordered_map> #include <chrono> #include <Tuple> #include <future> #include <stdexcept> #include <sstream> using namespace std; // this sets the length of the string we will be using as a key. // modify this to test whether key complexity changes the performance ratios // of the various maps static const size_t key_length = 20; // the number of keys we will generate (the size of the test) const size_t nkeys = 1000000; // the types of map we will test unordered_map<string, string> unordered; map<string, string> ordered; vector<pair<string, string>> flat_map; // a vector of all keys, which we can shuffle in order to randomise // access order of all our maps consistently vector<string> keys; // use a virtual method to prevent the optimiser from detecting that // our sink function actually does nothing. otherwise it might skew the test struct string_user { virtual void sink(const std::string&) = 0; virtual ~string_user() = default; }; struct real_string_user : string_user { virtual void sink(const std::string&) override { } }; struct real_string_user_print : string_user { virtual void sink(const std::string& s) override { cout << s << endl; } }; // generate a sink from a string - this is a runtime operation and therefore // prevents the optimiser from realising that the sink does nothing std::unique_ptr<string_user> make_sink(const std::string& name) { if (name == "print") { return make_unique<real_string_user_print>(); } if (name == "noprint") { return make_unique<real_string_user>(); } throw logic_error(name); } // generate a random key, given a random engine and a distribution auto gen_string = [](auto& engine, auto& dist) { std::string result(key_length, ' '); generate(begin(result), end(result), [&] { return dist(engine); }); return result; }; // comparison predicate for our flat map. struct pair_less { bool operator()(const pair<string, string>& l, const string& r) const { return l.first < r; } bool operator()(const string& l, const pair<string, string>& r) const { return l < r.first; } }; int main() { // generate the sink, preventing the optimiser from realising what it // does. stringstream ss; ss << "noprint"; string arg; ss >> arg; auto puser = make_sink(arg); // generate keys auto eng = std::default_random_engine(std::random_device()()); auto alpha_dist = std::uniform_int_distribution<char>('A', 'Z'); for (size_t i = 0 ; i < nkeys ; ++i) { bool inserted = false; auto value = to_string(i); while(!inserted) { // generate a key auto key = gen_string(eng, alpha_dist); // try to store it in the unordered map // if it already exists, force a regeneration // otherwise also store it in the ordered map and the flat map tie(ignore, inserted) = unordered.emplace(key, value); if (inserted) { flat_map.emplace_back(key, value); ordered.emplace(key, std::move(value)); // record the key for later use keys.emplace_back(std::move(key)); } } } // turn our vector 'flat map' into an actual flat map by sorting it by pair.first. This is the key. sort(begin(flat_map), end(flat_map), [](const auto& l, const auto& r) { return l.first < r.first; }); // shuffle the keys to randomise access order shuffle(begin(keys), end(keys), eng); // spawn a thread to time access to the unordered map auto unordered_future = async(launch::async, [&]() { auto start_time = chrono::system_clock::now(); for (auto const& key : keys) { puser->sink(unordered.at(key)); } auto stop_time = chrono::system_clock::now(); auto diff = stop_time - start_time; return diff; }); // spawn a thread to time access to the ordered map auto ordered_future = async(launch::async, [&] { auto start_time = chrono::system_clock::now(); for (auto const& key : keys) { puser->sink(ordered.at(key)); } auto stop_time = chrono::system_clock::now(); auto diff = stop_time - start_time; return diff; }); // spawn a thread to time access to the flat map auto flat_future = async(launch::async, [&] { auto start_time = chrono::system_clock::now(); for (auto const& key : keys) { auto i = lower_bound(begin(flat_map), end(flat_map), key, pair_less()); if (i != end(flat_map) && i->first == key) puser->sink(i->second); else throw invalid_argument(key); } auto stop_time = chrono::system_clock::now(); auto diff = stop_time - start_time; return diff; }); // synchronise all the threads and get the timings auto ordered_time = ordered_future.get(); auto unordered_time = unordered_future.get(); auto flat_time = flat_future.get(); // print cout << " ordered time: " << ordered_time.count() << endl; cout << "unordered time: " << unordered_time.count() << endl; cout << " flat map time: " << flat_time.count() << endl; return 0; }

結果：

 ordered time: 972711 unordered time: 335821 flat map time: 559768

ご覧のとおり、unordered_mapはマップとソートされたペアのベクトルを確実に上回っています。ペアのベクトルは、マップソリューションの2倍の速さです。 lower_boundとmap :: atがほぼ同等の複雑さを持っているので、これは興味深いです。

TL; DR

このテストでは、順序付けられていないマップは順序付けされたマップの約3倍（ルックアップの場合）の速度であり、ソートされたベクトルは確実にマップに勝っています。

私は実際にそれがどれほど速いかについてショックを受けました。