strictキーワードはgcc / g ++で大きな利点を提供しますか？

Question

Gcc/g ++ actualでC/C++ restrictキーワードを使用すると、（理論だけでなく）実際にパフォーマンスが大幅に向上するかどうかについて、数値や分析を見たことがありますか？

私はその使用を推奨/軽蔑するさまざまな記事を読みましたが、どちらの側の議論も実際に示している実数に出くわしたことはありません。

[〜＃〜]編集[〜＃〜]

restrictが正式にC++の一部ではないことは知っていますが、一部のコンパイラでサポートされており、 Christer Ericson による論文を読んだことがあります。

Nils Pipenbrinck · Accepted Answer

制限キーワードは違いを生みます。

いくつかの状況（画像処理）でファクター2以上の改善が見られました。ただし、ほとんどの場合、違いはそれほど大きくありません。約10％。

これは違いを説明する小さな例です。テストとして、非常に基本的な4x4ベクトル*行列変換を作成しました。関数を強制的にインライン化しないようにする必要があることに注意してください。それ以外の場合、GCCは、ベンチマークコードにエイリアシングポインタがないことを検出し、インライン化のために制限しても違いはありません。

変換関数を別のファイルに移動することもできます。

#include <math.h> #ifdef USE_RESTRICT #else #define __restrict #endif void transform (float * __restrict dest, float * __restrict src, float * __restrict matrix, int n) __attribute__ ((noinline)); void transform (float * __restrict dest, float * __restrict src, float * __restrict matrix, int n) { int i; // simple transform loop. // written with aliasing in mind. dest, src and matrix // are potentially aliasing, so the compiler is forced to reload // the values of matrix and src for each iteration. for (i=0; i<n; i++) { dest[0] = src[0] * matrix[0] + src[1] * matrix[1] + src[2] * matrix[2] + src[3] * matrix[3]; dest[1] = src[0] * matrix[4] + src[1] * matrix[5] + src[2] * matrix[6] + src[3] * matrix[7]; dest[2] = src[0] * matrix[8] + src[1] * matrix[9] + src[2] * matrix[10] + src[3] * matrix[11]; dest[3] = src[0] * matrix[12] + src[1] * matrix[13] + src[2] * matrix[14] + src[3] * matrix[15]; src += 4; dest += 4; } } float srcdata[4*10000]; float dstdata[4*10000]; int main (int argc, char**args) { int i,j; float matrix[16]; // init all source-data, so we don't get NANs for (i=0; i<16; i++) matrix[i] = 1; for (i=0; i<4*10000; i++) srcdata[i] = i; // do a bunch of tests for benchmarking. for (j=0; j<10000; j++) transform (dstdata, srcdata, matrix, 10000); }

結果:(私の2Ghz Core Duoで）

nils@doofnase:~$ gcc -O3 test.c nils@doofnase:~$ time ./a.out real 0m2.517s user 0m2.516s sys 0m0.004s nils@doofnase:~$ gcc -O3 -DUSE_RESTRICT test.c nils@doofnase:~$ time ./a.out real 0m2.034s user 0m2.028s sys 0m0.000s

thatシステムでは、実行が20％速くなります。

それがアーキテクチャにどの程度依存するかを示すために、Cortex-A8組み込みCPUで同じコードを実行させました（ループカウントを少し調整したため、それほど長く待ちたくありません）：

root@beagleboard:~# gcc -O3 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp test.c root@beagleboard:~# time ./a.out real 0m 7.64s user 0m 7.62s sys 0m 0.00s root@beagleboard:~# gcc -O3 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -DUSE_RESTRICT test.c root@beagleboard:~# time ./a.out real 0m 7.00s user 0m 6.98s sys 0m 0.00s

ここでの違いはわずか9％です（同じコンパイラーです）。

Ciro Santilli 新疆改造中心996ICU六四事件 · Answer

制限キーワードはgcc/g ++で大きな利点を提供しますか？

以下の例に示すように、canで命令の数を減らすことができるので、可能な限り使用してください。

GCC 4.8 Linux x86-64の例

入力：

void f(int *a, int *b, int *x) { *a += *x; *b += *x; } void fr(int *restrict a, int *restrict b, int *restrict x) { *a += *x; *b += *x; }

コンパイルと逆コンパイル：

gcc -g -std=c99 -O0 -c main.c objdump -S main.o

-O0、それらは同じです。

-O3：

void f(int *a, int *b, int *x) { *a += *x; 0: 8b 02 mov (%rdx),%eax 2: 01 07 add %eax,(%rdi) *b += *x; 4: 8b 02 mov (%rdx),%eax 6: 01 06 add %eax,(%rsi) void fr(int *restrict a, int *restrict b, int *restrict x) { *a += *x; 10: 8b 02 mov (%rdx),%eax 12: 01 07 add %eax,(%rdi) *b += *x; 14: 01 06 add %eax,(%rsi)

初心者の場合、呼び出し規約は次のとおりです。

rdi =最初のパラメーター
rsi = 2番目のパラメーター
rdx = 3番目のパラメーター

結論：4ではなく3つの命令。

もちろん、指示異なるレイテンシーを持つことができますですが、これは良い考えを与えます。

GCCがそれを最適化できたのはなぜですか？

上記のコードはウィキペディアの例から取られたもので、非常に明るくなっています。

fの疑似アセンブリ：

load R1 ← *x ; Load the value of x pointer load R2 ← *a ; Load the value of a pointer add R2 += R1 ; Perform Addition set R2 → *a ; Update the value of a pointer ; Similarly for b, note that x is loaded twice, ; because a may be equal to x. load R1 ← *x load R2 ← *b add R2 += R1 set R2 → *b

frの場合：

load R1 ← *x load R2 ← *a add R2 += R1 set R2 → *a ; Note that x is not reloaded, ; because the compiler knows it is unchanged ; load R1 ← *x load R2 ← *b add R2 += R1 set R2 → *b

本当に速いですか？

えーと...この簡単なテストではありません：

.text .global _start _start: mov $0x10000000, %rbx mov $x, %rdx mov $x, %rdi mov $x, %rsi loop: # START of interesting block mov (%rdx),%eax add %eax,(%rdi) mov (%rdx),%eax # Comment out this line. add %eax,(%rsi) # END ------------------------ dec %rbx cmp $0, %rbx jnz loop mov $60, %rax mov $0, %rdi syscall .data x: .int 0

その後：

as -o a.o a.S && ld a.o && time ./a.out

ubuntu 14.04 AMD64 CPU Inteli5-3210M。

私はまだ現代のCPUを理解していないことを告白します。次の場合はお知らせください。

私の方法に欠陥を見つけました
それがはるかに速くなるアセンブラテストケースを見つけました
違いがなかった理由を理解する

George · Answer

記事制限キーワードの謎を解くは論文を参照していますプログラマー指定のエイリアシングが悪い考えである理由（pdf）これは一般的に役に立たないと述べており、これを裏付ける測定値を提供しますアップ。

Clifford · Answer

restrictキーワードを許可するC++コンパイラは、それを無視する場合があることに注意してください。たとえば、ここの場合です。