Apache Beam：FlatMap対Map？

Question

どのシナリオでFlatMapまたはMapを使用すべきかを理解したいと思います。ドキュメントは私には明らかではなかった。

どのシナリオでFlatMapまたはMapの変換を使用する必要があるかはまだわかりません。

誰かが私の違いを理解できるように例を教えてもらえますか？

SparkでのFlatMapとMapの違いを理解していますが、類似性があるかどうかわかりませんか？

Pablo · Accepted Answer

Beamのこれらの変換は、Spark（Scalaも）とまったく同じです。

Map変換、maps N要素のPCollectionから別のN要素のPCollectionへ。

FlatMap変換は、N個の要素のPCollectionsをゼロ個以上の要素のN個のコレクションにマップし、それがflattenedから1つのPCollectionになります。

簡単な例として、次のことが起こります。

beam.Create([1, 2, 3]) | beam.Map(lambda x: [x, 'any']) # The result is a collection of THREE lists: [[1, 'any'], [2, 'any'], [3, 'any']]

一方、

beam.Create([1, 2, 3]) | beam.FlatMap(lambda x: [x, 'any']) # The lists that are output by the lambda, are then flattened into a # collection of SIX single elements: [1, 'any', 2, 'any', 3, 'any']