sparkデータフレームから複数の列を選択するためのリストの展開

Question

sparkデータフレームdfがあります。これらの列のリストを使用していくつかの列をサブ選択する方法はありますか？

scala> df.columns res0: Array[String] = Array("a", "b", "c", "d")

df.select("b", "c")のようなことができることを知っています。しかし、いくつかの列名val cols = List("b", "c")を含むリストがあるとします。これをdf.selectに渡す方法はありますか？ df.select(cols)はエラーをスローします。 Pythonのdf.select(*cols)のようなもの

Shagun Sodhani · Accepted Answer

df.select(cols.head, cols.tail: _*)を使用

動作するかどうか教えてください:)

@ Benからの説明：

キーはselectのメソッドシグネチャです。

select(col: String, cols: String*)

cols:String*エントリは、可変個の引数を取ります。 :_*は、引数をアンパックして、この引数で処理できるようにします。 pythonでの*argsでの展開と非常に似ています。他の例については、 here および here を参照してください。

Kshitij Kulshrestha · Answer

次のように、文字列をspark列に型キャストできます。

import org.Apache.spark.sql.functions._ df.select(cols.map(col): _*)

vEdwardpc · Answer

私が学んだ別のオプション。

import org.Apache.spark.sql.functions.col val columns = Seq[String]("col1", "col2", "col3") val colNames = columns.map(name => col(name)) val df = df.select(colNames:_*)

geosmart · Answer

あなたはこのようにすることができます

String[] originCols = ds.columns(); ds.selectExpr(originCols)

スパークselectExpソースコード

 /** * Selects a set of SQL expressions. This is a variant of `select` that accepts * SQL expressions. * * {{{ * // The following are equivalent: * ds.selectExpr("colA", "colB as newName", "abs(colC)") * ds.select(expr("colA"), expr("colB as newName"), expr("abs(colC)")) * }}} * * @group untypedrel * @since 2.0.0 */ @scala.annotation.varargs def selectExpr(exprs: String*): DataFrame = { select(exprs.map { expr => Column(sparkSession.sessionState.sqlParser.parseExpression(expr)) }: _*) }

raam86 · Answer

タイプColumn*の引数をselectに渡すことができます。

val df = spark.read.json("example.json") val cols: List[String] = List("a", "b") //convert string to Column val col: List[Column] = cols.map(df(_)) df.select(col:_*)

Eranga Atugoda · Answer

最初に、文字列配列を次のようにSparkデータセット列タイプのリストに変換します

String[] strColNameArray = new String[]{"a", "b", "c", "d"}; List<Column> colNames = new ArrayList<>(); for(String strColName : strColNameArray){ colNames.add(new Column(strColName)); }

次に、以下のようにselectステートメント内のJavaConversions関数を使用してリストを変換します。次のimportステートメントが必要です。

import scala.collection.JavaConversions; Dataset<Row> selectedDF = df.select(JavaConversions.asScalaBuffer(colNames ));

Unmesha SreeVeni · Answer

はい、scalaで。selectを使用できます。

。headおよび。tailを使用して、 List（）

例

val cols = List("b", "c") df.select(cols.head,cols.tail: _*)

説明