Apacheのスキーマからデータ型のリストを取得Spark

Question

Spark-Pythonには、DataFrameのスキーマから名前のリストを取得する次のコードがありますが、これは正常に機能しますが、データ型のリストを取得するにはどうすればよいですか？

columnNames = df.schema.names

たとえば、次のようなもの：

columnTypes = df.schema.types

DataFrameスキーマに含まれるデータ型の個別のリストを取得する方法はありますか？

Daniel de Paula · Accepted Answer

提案は次のとおりです。

df = sqlContext.createDataFrame([('a', 1)]) types = [f.dataType for f in df.schema.fields] types > [StringType, LongType]

参照：

Viacheslav Shalamov · Answer

質問のタイトルはpython固有ではないため、ここにscalaバージョンを追加します。

val tyes = df.schema.fields.map(f => f.dataType)

org.Apache.spark.sql.types.DataTypeの配列になります。

stack0114106 · Answer

Schema.dtypesを使用する

scala> val df = Seq(("ABC",10,20.4)).toDF("a","b","c") df: org.Apache.spark.sql.DataFrame = [a: string, b: int ... 1 more field] scala> scala> df.printSchema root |-- a: string (nullable = true) |-- b: integer (nullable = false) |-- c: double (nullable = false) scala> df.dtypes res2: Array[(String, String)] = Array((a,StringType), (b,IntegerType), (c,DoubleType)) scala> df.dtypes.map(_._2).toSet res3: scala.collection.immutable.Set[String] = Set(StringType, IntegerType, DoubleType) scala>