Java Stream API：なぜ順次実行モードと並列実行モードが区別されるのですか？

Question

ストリームパイプラインは、順次または並列に実行できます。この実行モードは、ストリームのプロパティです。ストリームは、順次実行または並列実行を最初に選択して作成されます。

私の仮定：

シーケンシャルストリームとパラレルストリームの間に機能的な違いはありません。出力は実行モードの影響を受けません。
パフォーマンスが向上するため、オーバーヘッドを正当化するための適切なコア数と問題サイズを考えると、並列ストリームが常に望ましいです。
コードを一度書けば、ハードウェア（結局のところ、これはJavaです）を気にせずにどこでも実行したいと考えています。

これらの仮定が有効であると仮定すると（メタ仮定のビットで何も問題はありません）、APIで実行モードを公開することの価値は何ですか？

Streamを宣言できるはずであり、順次/並列実行の選択は、ライブラリコードまたは使用可能なコアの関数としてのJVM自体のいずれかによって、下のレイヤーで自動的に処理される必要があります。実行時、問題のサイズなど。

確かに、並列ストリームもシングルコアマシンで機能すると仮定すると、おそらく常に並列ストリームを使用するだけでこれを実現できます。しかし、これは本当に醜いです-それがデフォルトのオプションであるのに、なぜ私のコードで並列ストリームへの明示的な参照があるのですか？

シーケンシャルストリームの使用を意図的にハードコーディングしたいシナリオがある場合でも、SequentialStreamを汚染するのではなく、その目的のためにサブインターフェイスStreamだけが存在しないのはなぜですか。実行モードスイッチで？

Louis Wasserman · Accepted Answer

ストリームを宣言できるようになっているようです。シーケンシャル/パラレル実行の選択は、実行時に使用可能なコアの関数として、ライブラリコードまたはJVM自体のいずれかによって、下のレイヤーで自動的に処理される必要があります。サイズ問題の、など。

現実には、a）ストリームはライブラリであり、特別なJVMマジックはありません。また、b）この特定のケースで正しい決定が何であるかを自動的に理解するほどスマートなライブラリを実際に設計することはできません。特定の関数を実行せずにどれだけのコストがかかるかを見積もる賢明な方法はありません-たとえその実装を内省できたとしても、それはできません-そして今、あなたはすべてのストリーム操作にベンチマークを導入して、理解しようとしています並列化する場合、並列処理のオーバーヘッドのコストに見合う価値があります。特に、並列処理のオーバーヘッドがどれほど悪いかを事前に知らないことを考えると、これは実用的ではありません。

パフォーマンスが向上するため、オーバーヘッドを正当化するための適切なコア数と問題サイズを考えると、並列ストリームが常に望ましいです。

常にではありませんが、実際には。一部のタスクは非常に小さいため、並列化する価値はありません。また、並列処理には常にある程度のオーバーヘッドがあります。（そして率直に言って、ほとんどのプログラマーは並列処理の有用性を過大評価する傾向があり、それが本当に痛いパフォーマンスであるときにどこでもそれを叩きます。）

基本的に、それはあなたが基本的にそれをプログラマーに押しのけなければならないほど難しい問題です。

Tagir Valeev · Answer

この質問には興味深いケースがあり、並列ストリームが桁違いに遅くなる場合があることを示しています。その特定の例では、並列バージョンは10分間実行されますが、順次バージョンは数秒かかります。

Kanagavelu Sugumar · Answer

シーケンシャルストリームとパラレルストリームの間に機能的な違いはありません。出力は実行モードの影響を受けません。

シーケンシャルストリームとパラレルストリームの実行には違いがあります。以下のコードではTEST_2結果は、並列スレッドの実行が順次方法よりもはるかに高速であることを示しています。

パフォーマンスが向上するため、オーバーヘッドを正当化するための適切なコア数と問題サイズを考えると、並列ストリームが常に望ましいです。

あんまり。タスクを並列スレッドで実行する価値がない（単純なタスク）場合は、単にコードにオーバーヘッドを追加しているだけです。 TEST_1結果はこれを示しています。また、すべてのワーカースレッドが1つの並列実行タスクでビジーである場合にも注意してください。次に、コードの他の場所での他の並列ストリーム操作がそれを待機します。

コードを一度書けば、ハードウェア（結局のところ、これはJavaです）を気にせずにどこでも実行したいと考えています。

プログラマーだけが知っているので; CPUに関係なく、このタスクを並列/順次に実行する価値がありますか。したがって、Java APIは両方のオプションを開発者に公開しました。

import Java.util.ArrayList; import Java.util.List; /* * Performance test over internal(parallel/sequential) and external iterations. * https://docs.Oracle.com/javase/tutorial/collections/streams/parallelism.html * * * Parallel computing involves dividing a problem into subproblems, * solving those problems simultaneously (in parallel, with each subproblem running in a separate thread), * and then combining the results of the solutions to the subproblems. Java SE provides the fork/join framework, * which enables you to more easily implement parallel computing in your applications. However, with this framework, * you must specify how the problems are subdivided (partitioned). * With aggregate operations, the Java runtime performs this partitioning and combining of solutions for you. * * Limit the parallelism that the ForkJoinPool offers you. You can do it yourself by supplying the -Djava.util.concurrent.ForkJoinPool.common.parallelism=1, * so that the pool size is limited to one and no gain from parallelization * * @see ForkJoinPool * https://docs.Oracle.com/javase/tutorial/essential/concurrency/forkjoin.html * * ForkJoinPool, that pool creates a fixed number of threads (default: number of cores) and * will never create more threads (unless the application indicates a need for those by using managedBlock). * * http://stackoverflow.com/questions/10797568/what-determines-the-number-of-threads-a-Java-forkjoinpool-creates * */ public class IterationThroughStream { private static boolean found = false; private static List<Integer> smallListOfNumbers = null; public static void main(String[] args) throws InterruptedException { // TEST_1 List<String> bigListOfStrings = new ArrayList<String>(); for(Long i = 1l; i <= 1000000l; i++) { bigListOfStrings.add("Counter no: "+ i); } System.out.println("Test Start"); System.out.println("-----------"); long startExternalIteration = System.currentTimeMillis(); externalIteration(bigListOfStrings); long endExternalIteration = System.currentTimeMillis(); System.out.println("Time taken for externalIteration(bigListOfStrings) is :" + (endExternalIteration - startExternalIteration) + " , and the result found: "+ found); long startInternalIteration = System.currentTimeMillis(); internalIteration(bigListOfStrings); long endInternalIteration = System.currentTimeMillis(); System.out.println("Time taken for internalIteration(bigListOfStrings) is :" + (endInternalIteration - startInternalIteration) + " , and the result found: "+ found); // TEST_2 smallListOfNumbers = new ArrayList<Integer>(); for(int i = 1; i <= 10; i++) { smallListOfNumbers.add(i); } long startExternalIteration1 = System.currentTimeMillis(); externalIterationOnSleep(smallListOfNumbers); long endExternalIteration1 = System.currentTimeMillis(); System.out.println("Time taken for externalIterationOnSleep(smallListOfNumbers) is :" + (endExternalIteration1 - startExternalIteration1)); long startInternalIteration1 = System.currentTimeMillis(); internalIterationOnSleep(smallListOfNumbers); long endInternalIteration1 = System.currentTimeMillis(); System.out.println("Time taken for internalIterationOnSleep(smallListOfNumbers) is :" + (endInternalIteration1 - startInternalIteration1)); // TEST_3 Thread t1 = new Thread(IterationThroughStream :: internalIterationOnThread); Thread t2 = new Thread(IterationThroughStream :: internalIterationOnThread); Thread t3 = new Thread(IterationThroughStream :: internalIterationOnThread); Thread t4 = new Thread(IterationThroughStream :: internalIterationOnThread); t1.start(); t2.start(); t3.start(); t4.start(); Thread.sleep(30000); } private static boolean externalIteration(List<String> bigListOfStrings) { found = false; for(String s : bigListOfStrings) { if(s.equals("Counter no: 1000000")) { found = true; } } return found; } private static boolean internalIteration(List<String> bigListOfStrings) { found = false; bigListOfStrings.parallelStream().forEach( (String s) -> { if(s.equals("Counter no: 1000000")){ //Have a breakpoint to look how many threads are spawned. found = true; } } ); return found; } private static boolean externalIterationOnSleep(List<Integer> smallListOfNumbers) { found = false; for(Integer s : smallListOfNumbers) { try { Thread.sleep(100); } catch (Exception e) { e.printStackTrace(); } } return found; } private static boolean internalIterationOnSleep(List<Integer> smallListOfNumbers) { found = false; smallListOfNumbers.parallelStream().forEach( //Removing parallelStream() will behave as single threaded (sequential access). (Integer s) -> { try { Thread.sleep(100); //Have a breakpoint to look how many threads are spawned. } catch (Exception e) { e.printStackTrace(); } } ); return found; } public static void internalIterationOnThread() { smallListOfNumbers.parallelStream().forEach( (Integer s) -> { try { /* * DANGEROUS * This will tell you that if all the 7 FJP(Fork join pool) worker threads are blocked for one single thread (e.g. t1), * then other normal three(t2 - t4) thread wont execute, will wait for FJP worker threads. */ Thread.sleep(100); //Have a breakpoint here. } catch (Exception e) { e.printStackTrace(); } } ); } }

roookeee · Answer

ストリームを宣言できるようになっているようです。シーケンシャル/パラレル実行の選択は、実行時に使用可能なコアの関数として、ライブラリコードまたはJVM自体のいずれかによって、下のレイヤーで自動的に処理される必要があります。サイズ問題の、など。

すでに与えられた答えに追加するには：

それはかなり大胆な仮定です。ある種のAIをトレーニングするためのボードゲームをシミュレートすることを想像してみてください。さまざまなプレイスルーの実行を並列化するのは非常に簡単です。新しいインスタンスを作成し、それを独自のスレッドで実行するだけです。別のプレイスルーと状態を共有しないため、ゲームロジックでマルチスレッドの問題を考慮する必要はありません。一方、ゲームロジック自体を並列化すると、あらゆる種類のマルチスレッドの問題が発生し、複雑さやパフォーマンスさえも高額になる可能性があります。

ストリームの動作を制御することで、（適切に制限された）柔軟性が得られます。これは、それ自体が優れたライブラリ設計の重要な機能です。