java.lang.NoClassDefFoundError：org / Apache / spark / streaming / Twitter / TwitterUtils $ TwitterPopularTagsの実行中

Question

私はSparkストリーミングとScalaの初心者です。プロジェクト要件のために、githubに存在するTwitterPopularTagsの例を実行しようとしました。SBTAssemblyが機能していなかったため、SBT Mavenを使用してビルドしようとしています。最初の多くの問題の後、jarファイルを作成できましたが、実行しようとすると、次のエラーが表示されます。

Exception in thread "main" Java.lang.NoClassDefFoundError: org/Apache/spark/streaming/Twitter/TwitterUtils$ at TwitterPopularTags$.main(TwitterPopularTags.scala:43) at TwitterPopularTags.main(TwitterPopularTags.scala) at Sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at Sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.Java:57) at Sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.Java:43) at Java.lang.reflect.Method.invoke(Method.Java:606) at org.Apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:331) at org.Apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.Apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: Java.lang.ClassNotFoundException: org.Apache.spark.streaming.Twitter.TwitterUtils$ at Java.net.URLClassLoader$1.run(URLClassLoader.Java:366) at Java.net.URLClassLoader$1.run(URLClassLoader.Java:355) at Java.security.AccessController.doPrivileged(Native Method) at Java.net.URLClassLoader.findClass(URLClassLoader.Java:354) at Java.lang.ClassLoader.loadClass(ClassLoader.Java:425) at Java.lang.ClassLoader.loadClass(ClassLoader.Java:358) ... 9 more

次の依存関係を追加しましたSpark-streaming_2.10：1.1.0 Spark-core_2.10：1.1.0 Spark-streaming-Twitter_2.10：1.1.0

Spark-streaming-Twitter用に1.2.0を試しましたが、同じエラーが発生していました。

事前に助けてくれてありがとう。

よろしく、vpv

vpv · Answer

ご提案いただきありがとうございます。 SBTアセンブリのみを使用してこの問題を解決できました。以下は、私がこれをどのように行ったかに関する詳細です。

Spark-Clouderaにすでに存在するVM Scala-Clouderaに存在するかどうかわからない場合は、SBTをインストールできます-これもインストールする必要があります。ローカルマシンで両方のインストールを行い、JarをVMに転送しました。これをインストールするには、次のリンクを使用しました

https://Gist.github.com/visenger/5496675

1）これらすべてが作成されたら。プロジェクトの親フォルダーを作成する必要があります。 Twitterというフォルダーを作成しました。

2）次の構造Twitter/src/main/scalaで別のフォルダーを作成し、このフォルダーにTwitterPopularTags.scalaという名前で.scalaファイルを作成しました。これは、githubから取得したコードからわずかに変更されています。インポート文を変更する必要がありました

import org.Apache.spark.streaming.Seconds import org.Apache.spark.streaming.StreamingContext import org.Apache.spark.streaming.StreamingContext._ import org.Apache.spark.SparkContext._ import org.Apache.spark.streaming.Twitter._ import org.Apache.spark.SparkConf

3）この後、親フォルダーの下に次の名前で別のフォルダーを作成します

Twitter /プロジェクト

assembly.sbtという名前のファイルを作成します。これには、Assemblyプラグインのパスがあります。以下は、ファイルに含まれる完全なコードです。

resolvers += Resolver.url("sbt-plugin-releases-scalasbt", url("http://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/")) addSbtPlugin("com.eed3si9n" % "sbt-Assembly" % "0.12.0")

4）上記の2つが作成されたら、プロジェクトの親ディレクトリ（Twitter）にbuild.sbtという名前のファイルを作成します。ここで、作成する必要がある.Jarファイルの名前と依存関係を指定する必要があります。このファイル内のコード間の空白行も重要であることに注意してください。

name := "TwitterPopularTags" version := "1.0" mergeStrategy in Assembly <<= (mergeStrategy in Assembly) { (old) => { case PathList("META-INF", xs @ _*) => MergeStrategy.discard case x => MergeStrategy.first } } libraryDependencies += "org.Apache.spark" %% "spark-core" % "1.1.0" % "provided" libraryDependencies += "org.Apache.spark" %% "spark-streaming" % "1.1.0" % "provided" libraryDependencies += "org.Apache.spark" %% "spark-streaming-Twitter" % "1.2.0" libraryDependencies += "org.Twitter4j" % "Twitter4j-stream" % "3.0.3" resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

5）最後に、ターミナルを開いてプロジェクトの親フォルダー（Twitter）に移動する必要があります。ここから、次のコマンドを入力します。

sbt Assembly

これにより、依存関係がダウンロードされ、必要なjarファイルが作成されます。

6）プログラムを実行するには、IDの下で作成されたTwitterアプリが必要であり、認証トークンとその他の詳細を提供します。これを作成する方法の詳細な手順は、次のリンクにあります。

http://ampcamp.berkeley.edu/3/exercises/realtime-processing-with-spark-streaming.html

7）上記のすべてを完了したら、VMからspark-submitコマンドを使用してジョブを実行できます。コマンドの例は次のとおりです。

./bin/spark-submit \ --class TwitterPopularTags \ --master local[4] \ /path/to/TwitterPopilarTags.jar \ comsumerkey consumersecret accesstoken accesssecret

8）これは、出力をコンソールに出力するので、出力を監視するには、コードを調整して頻度を減らす方が良いです。

詳細が必要な場合はお知らせください。

ありがとうございます。それでは、お元気で、

VPV

Marek Dudek · Answer

簡単な解決策を見つけました（1.5.1で動作しますが、おそらく以前のバージョンでも動作します）：

次のような--packagesパラメーターとMaven座標で送信します。

spark-submit --master local[*] \ --class TwitterStreaming \ --packages "org.Apache.spark:spark-streaming-Twitter_2.10:1.5.1" \ ${PATH_TO_JAR_IN_TARGET}

で説明

http://spark.Apache.org/docs/latest/programming-guide.html#using-the-Shell

hexabunny · Answer

このエラーは、実行時にTwitterUtilsクラス（またはscala言語TwitterUtilsオブジェクト）が表示されないことを意味しますが、コンパイル時には表示されます（そうしないと、mavenでビルドできません）作成したjarファイルに実際にそのクラス/オブジェクトが含まれていることを確認する必要があります。そのjarファイルを解凍して、実際に何が含まれているかを確認できます。最終的なjarに含めます。

DeepikaB · Answer

このようにしてみてください...

./bin/spark-submit \ --class TwitterPopularTags \ --jars (external_jars like Twitter4j,streaming-Twitter) \ --master local[4] \ /path/to/TwitterPopilarTags.jar \

コンシューマキーコンシューマシークレットアクセストークンアクセスシークレット

JMess · Answer

Jarに依存関係を含めるには、「脂肪jar」をビルドするようにMavenに指示する必要があります。「ファットjar」とは、プロジェクトだけでなく、必要なすべての依存関係の.classファイルを含むjarです（これがsbt Assemblyの機能です）。デフォルトのMavenの動作は、プロジェクトをライブラリのように扱い、したがって.classファイルのみでjarを構築することです。

ここに、あなたが望むことをする簡単なMaven Pomがあります。Scalaを使用するなど、他の一般的なSpark + Mavenの動作が含まれていますが、最も関連する部分は下部近くにあります。

<project xmlns="http://maven.Apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.Apache.org/POM/4.0.0 http://maven.Apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.jmess.sparkexamples</groupId> <artifactId>example</artifactId> <version>1.0.0</version> <properties> <!-- Use Java 1.8 --> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> <!-- Keep compiled against scala version uniform --> <scala.base.version>2.11</scala.base.version> <!-- Use most recent version of Scala compatible with stable Spark release --> <scala.version>${scala.base.version}.12</scala.version> <!-- Facilitates keeping multiple Spark dependencies aligned --> <spark.version>2.4.0</spark.version> </properties> <dependencies> <!-- Begin Spark Dependencies --> <!-- Provides the base Spark APIs. Required for base functionality --> <!-- https://mvnrepository.com/artifact/org.Apache.spark/spark-sql --> <dependency> <groupId>org.Apache.spark</groupId> <artifactId>spark-sql_${scala.base.version}</artifactId> <version>${spark.version}</version> <!-- In most cases this dependency is supplied by Spark --> <scope>provided</scope> </dependency> <!-- Provides the expanded APIs for Streaming with Kafka. Required in addition to spark-sql library --> <!-- https://mvnrepository.com/artifact/org.Apache.spark/spark-sql-kafka-0-10 --> <dependency> <groupId>org.Apache.spark</groupId> <artifactId>spark-sql-kafka-0-10_${scala.base.version}</artifactId> <version>${spark.version}</version> </dependency> <!-- End Spark Dependencies --> <!-- Popular scala configuration library --> <dependency> <groupId>com.typesafe</groupId> <artifactId>config</artifactId> <version>1.3.2</version> </dependency> <!-- To write to Splunk HTTP endpoint --> </dependencies> <build> <!-- Tells scala-maven-plugin where to look --> <sourceDirectory>src/main/scala</sourceDirectory> <testSourceDirectory>src/test/scala</testSourceDirectory> <plugins> <!-- For building scala projects using maven --> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>4.0.1</version> <!-- Includes the compiled Scala .class files in some maven goals --> <executions> <execution> <goals> <goal>add-source</goal> <goal>compile</goal> <goal>testCompile</goal> </goals> </execution> </executions> </plugin> <!-- !!!!!!! BUILD FAT JAR !!!!!!! --> <!-- Build a fat jar named example-1.0.0-jar-with-dependencies.jar --> <plugin> <artifactId>maven-Assembly-plugin</artifactId> <version>3.1.1</version> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> <id>make-Assembly</id> <!-- this is used for inheritance merges --> <phase>package</phase> <!-- bind to the packaging phase --> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build> </project>

注** spark yarnの代わりにジョブを送信する場合は、<scope>provided</scope>行

Priyanshu Singh · Answer

 **I have the same problem and I am not able to fix** name := "SentimentAnalyser" version := "0.1" scalaVersion := "2.11.11" libraryDependencies += "org.Apache.spark" % "spark-core_2.11" % "2.2.0" // https://mvnrepository.com/artifact/org.Apache.spark/park-streaming-Twitter_2.11 // https://mvnrepository.com/artifact/org.Apache.spark/spark-streaming_2.11 libraryDependencies += "org.Apache.spark" % "spark-streaming-Twitter_2.11" % "2.0.0" libraryDependencies += "org.Apache.spark" % "spark-streaming_2.11" % "2.2.0" package com import org.Apache.spark.SparkConf import org.Apache.spark.streaming.{Seconds, StreamingContext} import org.Apache.spark.streaming.Twitter.TwitterUtils object Sentiment { def main(args: Array[String]): Unit = { if(args.length<4){ System.out.print("Enter Consumer Key (API Key) Consumer Secret (API Secret)Access Token Access Token Secret") System.exit(1); } val Array(customer_key,customer_secrect,access_token,access_token_secret)=args.take(4) System.setProperty("twiteer4j.oauth.consumerKey",customer_key) System.setProperty("twiteer4j.oauth.customerSecret",customer_secrect) System.setProperty("twiteer4j.oauth.accessToken",access_token) System.setProperty("twiteer4j.oauth.accessTokenSecret",access_token_secret) val conf=new SparkConf().setAppName("Sentiment").setMaster("local") val scc=new StreamingContext(conf,Seconds(30)) //Dstream val stream=TwitterUtils.createStream(scc,None) val hashTag=stream.flatMap(status=>{status.getText.split(" ").filter(_.startsWith("#"))}) val topHashTag60=hashTag.map((_,1)).reduceByKeyAndWindow(_+_,Seconds(60)) .map{case (topic,count)=>(topic,count)}.transform(_.sortByKey(false)) val topHashTag10=hashTag.map((_,1)).reduceByKeyAndWindow(_+_,Seconds(10)) .map{case (topic,count)=>(topic,count)}.transform(_.sortByKey(false)) topHashTag60.foreachRDD(rdd=>{ val topList=rdd.take(10) println("Popular topic in last 60 sec (%s total)".format(rdd.count())) topList.foreach{case (count,tag)=>println("%s (%s tweets)".format(tag,count))} }) topHashTag10.foreachRDD(rdd=>{ val topList=rdd.take(10) println("Popular topic in last 10 sec (%s total)".format(rdd.count())) topList.foreach{case (count,tag)=>println("%s (%s tweets)".format(tag,count))} }) scc.start() scc.awaitTermination() } } I build jar using artifact in IJ .. spark-submit --class com.Sentiment /root/Desktop/SentimentAnalyser.jar XX XX XX XX ERROR: 17/10/29 01:22:24 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.46.132, 34179, None) 17/10/29 01:22:27 WARN StreamingContext: spark.master should be set as local[n], n > 1 in local mode if you have receivers to get data, otherwise Spark jobs will not get resources to process the received data. Exception in thread "main" Java.lang.NoClassDefFoundError: org/Apache/spark/streaming/Twitter/TwitterUtils$ at com.Sentiment$.main(Sentiment.scala:26) at com.Sentiment.main(Sentiment.scala) at Sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at Sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.Java:62) at Sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.Java:43) at Java.lang.reflect.Method.invoke(Method.Java:498) at org.Apache.spark.deploy.SparkSubmit$.org$Apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755) at org.Apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.Apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.Apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.Apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: Java.lang.ClassNotFoundException: org.Apache.spark.streaming.Twitter.TwitterUtils$ at Java.net.URLClassLoader.findClass(URLClassLoader.Java:381) at Java.lang.ClassLoader.loadClass(ClassLoader.Java:424) at Java.lang.ClassLoader.loadClass(ClassLoader.Java