C ++でコードスニペットの実行時間を計算する方法

Question

C++コードスニペットの実行時間を数秒で計算する必要があります。 WindowsまたはUnixマシンで動作している必要があります。

これを行うには、次のコードを使用します。（インポート前）

clock_t startTime = clock(); // some code here // to compute its execution duration in runtime cout << double( clock() - startTime ) / (double)CLOCKS_PER_SEC<< " seconds." << endl;

ただし、a = a + 1などの小さな入力または短いステートメントの場合、「0秒」の結果が得られます。 0.0000001秒のようなものである必要があると思います。

この場合、JavaのSystem.nanoTime()はかなりうまく機能することを覚えています。ただし、C++のclock()関数から同じ正確な機能を取得することはできません。

解決策はありますか？

Thomas Bonini · Accepted Answer

私が書いたこの関数を使用できます。 GetTimeMs64()を呼び出すと、Unixエポックがシステムクロックを使用してから経過したミリ秒数を返します-ミリ秒単位を除き、time(NULL)と同様です。

WindowsとLinuxの両方で動作します。スレッドセーフです。

ウィンドウの粒度は15ミリ秒であることに注意してください。 Linuxでは実装に依存しますが、通常は15ミリ秒にもなります。

#ifdef _WIN32 #include <Windows.h> #else #include <sys/time.h> #include <ctime> #endif /* Remove if already defined */ typedef long long int64; typedef unsigned long long uint64; /* Returns the amount of milliseconds elapsed since the UNIX Epoch. Works on both * windows and linux. */ uint64 GetTimeMs64() { #ifdef _WIN32 /* Windows */ FILETIME ft; LARGE_INTEGER li; /* Get the amount of 100 nano seconds intervals elapsed since January 1, 1601 (UTC) and copy it * to a LARGE_INTEGER structure. */ GetSystemTimeAsFileTime(&ft); li.LowPart = ft.dwLowDateTime; li.HighPart = ft.dwHighDateTime; uint64 ret = li.QuadPart; ret -= 116444736000000000LL; /* Convert from file time to UNIX Epoch time. */ ret /= 10000; /* From 100 nano seconds (10^-7) to 1 millisecond (10^-3) intervals */ return ret; #else /* Linux */ struct timeval tv; gettimeofday(&tv, NULL); uint64 ret = tv.tv_usec; /* Convert from micro seconds (10^-6) to milliseconds (10^-3) */ ret /= 1000; /* Adds the seconds (10^0) after converting them to milliseconds (10^-3) */ ret += (tv.tv_sec * 1000); return ret; #endif }

arhuaco · Answer

マイクロ秒（UNIX、POSIXなど）を使用する別の作業例があります。

 #include <sys/time.h> typedef unsigned long long timestamp_t; static timestamp_t get_timestamp () { struct timeval now; gettimeofday (&now, NULL); return now.tv_usec + (timestamp_t)now.tv_sec * 1000000; } ... timestamp_t t0 = get_timestamp(); // Process timestamp_t t1 = get_timestamp(); double secs = (t1 - t0) / 1000000.0L;

これをコーディングしたファイルは次のとおりです。

https://github.com/arhuaco/junkcode/blob/master/emqbit-bench/bench.c

gongzhitaao · Answer

以下は、満足のいく解像度を提供するC++ 11の簡単なソリューションです。

#include <iostream> #include <chrono> class Timer { public: Timer() : beg_(clock_::now()) {} void reset() { beg_ = clock_::now(); } double elapsed() const { return std::chrono::duration_cast<second_> (clock_::now() - beg_).count(); } private: typedef std::chrono::high_resolution_clock clock_; typedef std::chrono::duration<double, std::ratio<1> > second_; std::chrono::time_point<clock_> beg_; };

または* ++、c ++ 03の場合

#include <iostream> #include <ctime> class Timer { public: Timer() { clock_gettime(CLOCK_REALTIME, &beg_); } double elapsed() { clock_gettime(CLOCK_REALTIME, &end_); return end_.tv_sec - beg_.tv_sec + (end_.tv_nsec - beg_.tv_nsec) / 1000000000.; } void reset() { clock_gettime(CLOCK_REALTIME, &beg_); } private: timespec beg_, end_; };

以下に使用例を示します。

int main() { Timer tmr; double t = tmr.elapsed(); std::cout << t << std::endl; tmr.reset(); t = tmr.elapsed(); std::cout << t << std::endl; return 0; }

から https://Gist.github.com/gongzhitaao/7062087

Tomas Andrle · Answer

#include <boost/progress.hpp> using namespace boost; int main (int argc, const char * argv[]) { progress_timer timer; // do stuff, preferably in a 100x loop to make it take longer. return 0; }

progress_timerが範囲外になると、作成からの経過時間が出力されます。

UPDATE：簡単なスタンドアロンの置き換えを行いました（OSX/iOSですが移植は簡単です）： https://github.com/ catnapgames/TestTimerScoped

Captain Comic · Answer

WindowsにはQueryPerformanceCounter（）関数があり、Unixにはgettimeofday（）があります。どちらの関数も少なくとも1マイクロ秒の差を測定できます。

kriss · Answer

私が書いたいくつかのプログラムでは、そのような目的で RDTS を使用しました。 RDTSCは時間ではなく、プロセッサの起動からのサイクル数です。秒単位で結果を得るにはシステムで調整する必要がありますが、パフォーマンスを評価したい場合は本当に便利です。秒単位に戻さずにサイクル数を直接使用するのがさらに良いでしょう。

（上記のリンクはフランス語版のウィキペディアのページにありますが、C++コードサンプルがあります。英語版は here ）

user8991265 · Answer

（ウィンドウ固有のソリューション）ウィンドウの下で正確なタイミングを取得する現在（2017年頃）の方法は、「QueryPerformanceCounter」を使用することです。このアプローチには、非常に正確な結果が得られるという利点があり、MSが推奨しています。コードブロブを新しいコンソールアプリに組み込むだけで、実用的なサンプルを取得できます。ここで長い議論があります：高解像度タイムスタンプの取得

#include <iostream> #include <tchar.h> #include <windows.h> int main() { constexpr int MAX_ITER{ 10000 }; constexpr __int64 us_per_hour{ 3600000000ull }; // 3.6e+09 constexpr __int64 us_per_min{ 60000000ull }; constexpr __int64 us_per_sec{ 1000000ull }; constexpr __int64 us_per_ms{ 1000ull }; // easy to work with __int64 startTick, endTick, ticksPerSecond, totalTicks = 0ull; QueryPerformanceFrequency((LARGE_INTEGER *)&ticksPerSecond); for (int iter = 0; iter < MAX_ITER; ++iter) {// start looping QueryPerformanceCounter((LARGE_INTEGER *)&startTick); // Get start tick // code to be timed std::cout << "cur_tick = " << iter << "
"; QueryPerformanceCounter((LARGE_INTEGER *)&endTick); // Get end tick totalTicks += endTick - startTick; // accumulate time taken } // convert to elapsed microseconds __int64 totalMicroSeconds = (totalTicks * 1000000ull)/ ticksPerSecond; __int64 hours = totalMicroSeconds / us_per_hour; totalMicroSeconds %= us_per_hour; __int64 minutes = totalMicroSeconds / us_per_min; totalMicroSeconds %= us_per_min; __int64 seconds = totalMicroSeconds / us_per_sec; totalMicroSeconds %= us_per_sec; __int64 milliseconds = totalMicroSeconds / us_per_ms; totalMicroSeconds %= us_per_ms; std::cout << "Total time: " << hours << "h "; std::cout << minutes << "m " << seconds << "s " << milliseconds << "ms "; std::cout << totalMicroSeconds << "us
"; return 0; }

Thomas Matthews · Answer

システムから時間情報を取得するには、標準ライブラリ関数を使用することをお勧めします。

より細かい解像度が必要な場合は、実行の繰り返しを増やします。プログラムを1回実行してサンプルを取得する代わりに、1000回以上実行します。

Jack Giffin · Answer

スレッドスケジューリングの完全な失敗のないソリューションは、各テストごとにまったく同じ時間を生成しますが、OSに依存しないようにプログラムをコンパイルし、コンピューターを起動してOSフリー環境でプログラムを実行します。しかし、これはほとんど非実用的であり、せいぜい難しいでしょう。 OSフリーにする代わりに、現在のスレッドのアフィニティを1コアに設定し、優先度を最高に設定するだけです。この代替手段は、一貫した十分な結果を提供するはずです。また、デバッグを妨げる最適化をオフにする必要があります。これは、g ++またはgccではコマンドラインに-Ogを追加を意味し、テスト対象のコードが最適化されないようにします。 -O0フラグは、タイミング結果に含まれる余分な不要なオーバーヘッドを導入するため、使用しないでください。これにより、コードのタイミング速度がゆがみます。それどころか、最終的なプロダクションビルドで-Ofast（または少なくとも-O3）を使用すると仮定し、「デッド」コード除去の問題を無視すると、-Ogが実行されます-Ofastと比較した最適化はごくわずかです。したがって、-Ogは、最終製品のコードの実際の速度を誤って表す可能性があります。さらに、すべての速度テストは（ある程度）偽ります：-Ofastでコンパイルされた最終製品では、コードの各スニペット/セクション/機能は分離されません。むしろ、コードの各スニペットは連続して次のコードに流れ込むため、コンパイラーはあらゆる場所のコードを結合、マージ、および最適化することができます。同時に、reallocを多用するコードスニペットのベンチマークを行っている場合、十分なメモリフラグメンテーションがある本番製品では、コードスニペットの実行が遅くなる可能性があります。したがって、「全体はその部分の合計よりも大きい」という表現がこの状況に適用されます。これは、最終的な実動ビルドのコードが、速度テストを行う個々のスニペットよりも著しく速くまたは遅く実行されるためです。不一致を軽減する可能性のある部分的な解決策は、-Ofastを使用して、デッドコード/ループの除去を防ぐためにテストに含まれる変数にasm volatile("" :: "r"(var))を追加して速度テストを行うことです。

Windowsコンピューターで平方根関数をベンチマークする方法の例を次に示します。

// set USE_ASM_TO_PREVENT_ELIMINATION to 0 to prevent `asm volatile("" :: "r"(var))` // set USE_ASM_TO_PREVENT_ELIMINATION to 1 to enforce `asm volatile("" :: "r"(var))` #define USE_ASM_TO_PREVENT_ELIMINATION 1 #include <iostream> #include <iomanip> #include <cstdio> #include <chrono> #include <cmath> #include <windows.h> #include <intrin.h> #pragma intrinsic(__rdtsc) #include <cstdint> class Timer { public: Timer() : beg_(clock_::now()) {} void reset() { beg_ = clock_::now(); } double elapsed() const { return std::chrono::duration_cast<second_> (clock_::now() - beg_).count(); } private: typedef std::chrono::high_resolution_clock clock_; typedef std::chrono::duration<double, std::ratio<1> > second_; std::chrono::time_point<clock_> beg_; }; unsigned int guess_sqrt32(register unsigned int n) { register unsigned int g = 0x8000; if(g*g > n) { g ^= 0x8000; } g |= 0x4000; if(g*g > n) { g ^= 0x4000; } g |= 0x2000; if(g*g > n) { g ^= 0x2000; } g |= 0x1000; if(g*g > n) { g ^= 0x1000; } g |= 0x0800; if(g*g > n) { g ^= 0x0800; } g |= 0x0400; if(g*g > n) { g ^= 0x0400; } g |= 0x0200; if(g*g > n) { g ^= 0x0200; } g |= 0x0100; if(g*g > n) { g ^= 0x0100; } g |= 0x0080; if(g*g > n) { g ^= 0x0080; } g |= 0x0040; if(g*g > n) { g ^= 0x0040; } g |= 0x0020; if(g*g > n) { g ^= 0x0020; } g |= 0x0010; if(g*g > n) { g ^= 0x0010; } g |= 0x0008; if(g*g > n) { g ^= 0x0008; } g |= 0x0004; if(g*g > n) { g ^= 0x0004; } g |= 0x0002; if(g*g > n) { g ^= 0x0002; } g |= 0x0001; if(g*g > n) { g ^= 0x0001; } return g; } unsigned int empty_function( unsigned int _input ) { return _input; } unsigned long long empty_ticks=0; double empty_seconds=0; Timer my_time; template<unsigned int benchmark_repetitions> void benchmark( char* function_name, auto (*function_to_do)( auto ) ) { register unsigned int i=benchmark_repetitions; register unsigned long long start=0; my_time.reset(); start=__rdtsc(); while ( i-- ) { auto result = (*function_to_do)( i << 7 ); #if USE_ASM_TO_PREVENT_ELIMINATION == 1 asm volatile("" :: "r"( // There is no data type in C++ that is smaller than a char, so it will // not throw a segmentation fault error to reinterpret any arbitrary // data type as a char. Although, the compiler might not like it. result )); #endif } if ( function_name == nullptr ) { empty_ticks = (__rdtsc()-start); empty_seconds = my_time.elapsed(); std::cout<< "Empty:
" << empty_ticks << " ticks
" << benchmark_repetitions << " repetitions
" << std::setprecision(15) << empty_seconds << " seconds

"; } else { std::cout<< function_name<<":
" << (__rdtsc()-start-empty_ticks) << " ticks
" << benchmark_repetitions << " repetitions
" << std::setprecision(15) << (my_time.elapsed()-empty_seconds) << " seconds

"; } } int main( void ) { void* Cur_Thread= GetCurrentThread(); void* Cur_Process= GetCurrentProcess(); unsigned long long Current_Affinity; unsigned long long System_Affinity; unsigned long long furthest_affinity; unsigned long long nearest_affinity; if( ! SetThreadPriority(Cur_Thread,THREAD_PRIORITY_TIME_CRITICAL) ) { SetThreadPriority( Cur_Thread, THREAD_PRIORITY_HIGHEST ); } if( ! SetPriorityClass(Cur_Process,REALTIME_PRIORITY_CLASS) ) { SetPriorityClass( Cur_Process, HIGH_PRIORITY_CLASS ); } GetProcessAffinityMask( Cur_Process, &Current_Affinity, &System_Affinity ); furthest_affinity = 0x8000000000000000ULL>>__builtin_clzll(Current_Affinity); nearest_affinity = 0x0000000000000001ULL<<__builtin_ctzll(Current_Affinity); SetProcessAffinityMask( Cur_Process, furthest_affinity ); SetThreadAffinityMask( Cur_Thread, furthest_affinity ); const int repetitions=524288; benchmark<repetitions>( nullptr, empty_function ); benchmark<repetitions>( "Standard Square Root", standard_sqrt ); benchmark<repetitions>( "Original Guess Square Root", original_guess_sqrt32 ); benchmark<repetitions>( "New Guess Square Root", new_guess_sqrt32 ); SetThreadPriority( Cur_Thread, THREAD_PRIORITY_IDLE ); SetPriorityClass( Cur_Process, IDLE_PRIORITY_CLASS ); SetProcessAffinityMask( Cur_Process, nearest_affinity ); SetThreadAffinityMask( Cur_Thread, nearest_affinity ); for (;;) { getchar(); } return 0; }

また、タイマーのおかげでマイク・ジャービスに感謝します。

（これは非常に重要です）より大きなコードスニペットを実行する場合は、コンピューターがフリーズしないように反復回数を実際に減らす必要があることに注意してください。

Adisak · Answer

全体（ループ+パフォーマンスタイミング）を数回、平均して実行するよりも、内部ループの繰り返しを分割して、パフォーマンスタイミングを1回だけにして平均して、内側ループを数回実行することをお勧めします。これにより、実際のプロファイルセクションに対するパフォーマンスタイミングコードのオーバーヘッドが削減されます。

適切なシステムのタイマー呼び出しをラップします。 Windowsの場合、QueryPerformanceCounterは非常に高速で、使用しても「安全」です。

最新のX86 PCでも "rdtsc"を使用できますが、一部のマルチコアマシン（コアホッピングによりタイマーが変更される可能性があります）または何らかのスピードステップがオンになっている場合に問題が発生する可能性があります。

Mike Jarvis · Answer

実行されるたびに同じコードストレッチを計測する場合（たとえば、ボトルネックと思われるコードのプロファイリングなど）に、Andreas Boniniの関数のラッパー（わずかな変更）が役立ちます。

#ifdef _WIN32 #include <Windows.h> #else #include <sys/time.h> #endif /* * A simple timer class to see how long a piece of code takes. * Usage: * * { * static Timer timer("name"); * * ... * * timer.start() * [ The code you want timed ] * timer.stop() * * ... * } * * At the end of execution, you will get output: * * Time for name: XXX seconds */ class Timer { public: Timer(std::string name, bool start_running=false) : _name(name), _accum(0), _running(false) { if (start_running) start(); } ~Timer() { stop(); report(); } void start() { if (!_running) { _start_time = GetTimeMicroseconds(); _running = true; } } void stop() { if (_running) { unsigned long long stop_time = GetTimeMicroseconds(); _accum += stop_time - _start_time; _running = false; } } void report() { std::cout<<"Time for "<<_name<<": " << _accum / 1.e6 << " seconds
"; } private: // cf. http://stackoverflow.com/questions/1861294/how-to-calculate-execution-time-of-a-code-snippet-in-c unsigned long long GetTimeMicroseconds() { #ifdef _WIN32 /* Windows */ FILETIME ft; LARGE_INTEGER li; /* Get the amount of 100 nano seconds intervals elapsed since January 1, 1601 (UTC) and copy it * * to a LARGE_INTEGER structure. */ GetSystemTimeAsFileTime(&ft); li.LowPart = ft.dwLowDateTime; li.HighPart = ft.dwHighDateTime; unsigned long long ret = li.QuadPart; ret -= 116444736000000000LL; /* Convert from file time to UNIX Epoch time. */ ret /= 10; /* From 100 nano seconds (10^-7) to 1 microsecond (10^-6) intervals */ #else /* Linux */ struct timeval tv; gettimeofday(&tv, NULL); unsigned long long ret = tv.tv_usec; /* Adds the seconds (10^0) after converting them to microseconds (10^-6) */ ret += (tv.tv_sec * 1000000); #endif return ret; } std::string _name; long long _accum; unsigned long long _start_time; bool _running; };

nullqube · Answer

コードブロックをベンチマークする単純なクラス：

using namespace std::chrono; class benchmark { public: time_point<high_resolution_clock> t0, t1; unsigned int *d; benchmark(unsigned int *res) : d(res) { t0 = high_resolution_clock::now(); } ~benchmark() { t1 = high_resolution_clock::now(); milliseconds dur = duration_cast<milliseconds>(t1 - t0); *d = dur.count(); } }; // simple usage // unsigned int t; // { // put the code in a block // benchmark bench(&t); // // ... // // code to benchmark // } // HERE the t contains time in milliseconds // one way to use it can be : #define BENCH(TITLE,CODEBLOCK) \ unsigned int __time__##__LINE__ = 0; \ { benchmark bench(&__time__##__LINE__); \ CODEBLOCK \ } \ printf("%s took %d ms
",(TITLE),__time__##__LINE__); int main(void) { BENCH("TITLE",{ for(int n = 0; n < testcount; n++ ) int a = n % 3; }); return 0; }

burner · Answer

関数呼び出しをN回呼び出して平均を返すラムダを作成しました。

double c = BENCHMARK_CNT(25, fillVectorDeque(variable));

C++ 11ヘッダー here を見つけることができます。

Neil · Answer

Chronoライブラリのhigh_resolution_clockを使用して、コードブロックのパフォーマンスを測定するための簡単なユーティリティを作成しました： https://github.com/nfergu/codetimer 。

異なるキーに対してタイミングを記録でき、各キーのタイミングの集計ビューを表示できます。

使用法は次のとおりです。

#include <chrono> #include <iostream> #include "codetimer.h" int main () { auto start = std::chrono::high_resolution_clock::now(); // some code here CodeTimer::record("mykey", start); CodeTimer::printStats(); return 0; }

rwp · Answer

また、GitHubの[cxx-rtimers][1]を見ると、ローカル変数を作成できるコードブロックの実行時に統計を収集するためのヘッダー専用ルーチンが提供されます。これらのタイマーには、C++ 11でstd :: chronoを使用するバージョン、Boostライブラリーのタイマー、または標準POSIXタイマー関数があります。これらのタイマーは、関数内で費やされた平均時間、最大時間、最小時間、および呼び出された回数を報告します。次のように簡単に使用できます。

#include <rtimers/cxx11.hpp> void expensiveFunction() { static rtimers::cxx11::DefaultTimer timer("expensive"); auto scopedStartStop = timer.scopedStart(); // Do something costly... }

Brendan Long · Answer

boost :: timer は、おそらく必要なだけの精度を提供します。 a = a+1;にかかる時間を示すほど正確ではありませんが、数ナノ秒かかる時間を計る理由は何ですか？