Javaストリーム：効率的な「区別して並べ替え」を行う方法？

Question

Stream<T>そして、明確な要素のみを取得してソートしたい。

単純なアプローチは、次のことだけを行うことです。

Stream.of(...) .sorted() .distinct()

または、おそらくその逆です：

Stream.of(...) .distinct() .sorted()

JDKのソースコードでは両方の実装に実際にはアクセスできないため、メモリの消費とパフォーマンスへの影響について考えていただけです。

それとも、次のように独自のフィルタを記述する方が効率的でしょうか？

Stream.of(...) .sorted() .filter(noAdjacentDuplicatesFilter()) public static Predicate<Object> noAdjacentDuplicatesFilter() { final Object[] previousValue = {new Object()}; return value -> { final boolean takeValue = !Objects.equals(previousValue[0], value); previousValue[0] = value; return takeValue; }; }

Holger · Accepted Answer

distinct()の後にsorted()操作をチェーンすると、実装はデータのソートされた性質を利用し、内部HashSetの構築を回避します。これは次のように示されますプログラム

_public class DistinctAndSort { static int COMPARE, EQUALS, HASHCODE; static class Tracker implements Comparable<Tracker> { static int SERIAL; int id; Tracker() { id=SERIAL++/2; } public int compareTo(Tracker o) { COMPARE++; return Integer.compare(id, o.id); } public int hashCode() { HASHCODE++; return id; } public boolean equals(Object obj) { EQUALS++; return super.equals(obj); } } public static void main(String[] args) { System.out.println("adjacent sorted() and distinct()"); Stream.generate(Tracker::new).limit(100) .sorted().distinct() .forEachOrdered(o -> {}); System.out.printf("compareTo: %d, EQUALS: %d, HASHCODE: %d%n", COMPARE, EQUALS, HASHCODE); COMPARE=EQUALS=HASHCODE=0; System.out.println("now with intermediate operation"); Stream.generate(Tracker::new).limit(100) .sorted().map(x -> x).distinct() .forEachOrdered(o -> {}); System.out.printf("compareTo: %d, EQUALS: %d, HASHCODE: %d%n", COMPARE, EQUALS, HASHCODE); } } _

印刷されます

_adjacent sorted() and distinct() compareTo: 99, EQUALS: 99, HASHCODE: 0 now with intermediate operation compareTo: 99, EQUALS: 100, HASHCODE: 200 _

map(x -> x)のように単純な中間演算は、Stream実装では認識できないため、マッピング関数の結果に関して要素がソートされていない可能性があると想定する必要があります。

この種の最適化が行われる保証はありませんが、Stream実装の開発者がその最適化を削除せず、さらに最適化を追加しようとしないため、独自の実装をロールすると、コードのメリットが失われると想定するのが妥当です。将来の最適化。

さらに、作成したのは「ステートフルな述語」です。これは強くお勧めできません。もちろん、並列ストリームで使用すると壊れます。

Stream APIがこの操作を十分に効率的に実行することを信頼できない場合は、Stream APIなしでこの特定の操作を実装した方がよい場合があります。

Viktor Mellgren · Answer

免責事項：特にウォームアップが必要で、制御された環境で他のプロセスが実行されていないJVMで、パフォーマンステストが困難であることを知っています

テストすると、これらの結果が得られるので、あなたの実装は並列実行にメリットがあるようです。（4コア+ハイパースレッディングを搭載したi7で実行）。

したがって、「.distinct().sorted()」の方が遅いようです。 Holgerによって予測/説明されたとおり

Round 1 (Warm up?) 3938 2449 5747 Round 2 2834 2620 3984 Round 3 Parallel 831 4343 6346 Round 4 Parallel 825 3309 6339

コードの使用：

package test.test; import Java.util.Collections; import Java.util.List; import Java.util.Objects; import Java.util.function.Predicate; import Java.util.stream.Collectors; import Java.util.stream.IntStream; public class SortDistinctTest { public static void main(String[] args) { IntStream range = IntStream.range(0, 6_000_000); List<Integer> collect = range.boxed().collect(Collectors.toList()); Collections.shuffle(collect); long start = System.currentTimeMillis(); System.out.println("Round 1 (Warm up?)"); collect.stream().sorted().filter(noAdjacentDuplicatesFilter()).collect(Collectors.counting()); long fst = System.currentTimeMillis(); System.out.println(fst - start); collect.stream().sorted().distinct().collect(Collectors.counting()); long snd = System.currentTimeMillis(); System.out.println(snd - fst); collect.stream().distinct().sorted().collect(Collectors.counting()); long end = System.currentTimeMillis(); System.out.println(end - snd); System.out.println("Round 2"); collect.stream().sorted().filter(noAdjacentDuplicatesFilter()).collect(Collectors.counting()); fst = System.currentTimeMillis(); System.out.println(fst - end); collect.stream().sorted().distinct().collect(Collectors.counting()); snd = System.currentTimeMillis(); System.out.println(snd - fst); collect.stream().distinct().sorted().collect(Collectors.counting()); end = System.currentTimeMillis(); System.out.println(end - snd); System.out.println("Round 3 Parallel"); collect.stream().parallel().sorted().filter(noAdjacentDuplicatesFilter()).collect(Collectors.counting()); fst = System.currentTimeMillis(); System.out.println(fst - end); collect.stream().parallel().sorted().distinct().collect(Collectors.counting()); snd = System.currentTimeMillis(); System.out.println(snd - fst); collect.stream().parallel().distinct().sorted().collect(Collectors.counting()); end = System.currentTimeMillis(); System.out.println(end - snd); System.out.println("Round 4 Parallel"); collect.stream().parallel().sorted().filter(noAdjacentDuplicatesFilter()).collect(Collectors.counting()); fst = System.currentTimeMillis(); System.out.println(fst - end); collect.stream().parallel().sorted().distinct().collect(Collectors.counting()); snd = System.currentTimeMillis(); System.out.println(snd - fst); collect.stream().parallel().distinct().sorted().collect(Collectors.counting()); end = System.currentTimeMillis(); System.out.println(end - snd); } public static Predicate<Object> noAdjacentDuplicatesFilter() { final Object[] previousValue = { new Object() }; return value -> { final boolean takeValue = !Objects.equals(previousValue[0], value); previousValue[0] = value; return takeValue; }; } }