Hadoop：Java.lang.ClassCastException：org.Apache.hadoop.io.LongWritableはorg.Apache.hadoop.io.Textにキャストできません

Question

私のプログラムは次のようになります

public class TopKRecord extends Configured implements Tool { public static class MapClass extends Mapper<Text, Text, Text, Text> { public void map(Text key, Text value, Context context) throws IOException, InterruptedException { // your map code goes here String[] fields = value.toString().split(","); String year = fields[1]; String claims = fields[8]; if (claims.length() > 0 && (!claims.startsWith("\""))) { context.write(new Text(year.toString()), new Text(claims.toString())); } } } public int run(String args[]) throws Exception { Job job = new Job(); job.setJarByClass(TopKRecord.class); job.setMapperClass(MapClass.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJobName("TopKRecord"); job.setMapOutputValueClass(Text.class); job.setNumReduceTasks(0); boolean success = job.waitForCompletion(true); return success ? 0 : 1; } public static void main(String args[]) throws Exception { int ret = ToolRunner.run(new TopKRecord(), args); System.exit(ret); } }

データは次のようになります

"PATENT","GYEAR","GDATE","APPYEAR","COUNTRY","POSTATE","ASSIGNEE","ASSCODE","CLAIMS","NCLASS","CAT","SUBCAT","CMADE","CRECEIVE","RATIOCIT","GENERAL","ORIGINAL","FWDAPLAG","BCKGTLAG","SELFCTUB","SELFCTLB","SECDUPBD","SECDLWBD" 3070801,1963,1096,,"BE","",,1,,269,6,69,,1,,0,,,,,,, 3070802,1963,1096,,"US","TX",,1,,2,6,63,,0,,,,,,,,, 3070803,1963,1096,,"US","IL",,1,,2,6,63,,9,,0.3704,,,,,,, 3070804,1963,1096,,"US","OH",,1,,2,6,63,,3,,0.6667,,,,,,,

このプログラムを実行すると、コンソールに次のように表示されます

12/08/02 12:43:34 INFO mapred.JobClient: Task Id : attempt_201208021025_0007_m_000000_0, Status : FAILED Java.lang.ClassCastException: org.Apache.hadoop.io.LongWritable cannot be cast to org.Apache.hadoop.io.Text at com.hadoop.programs.TopKRecord$MapClass.map(TopKRecord.Java:26) at org.Apache.hadoop.mapreduce.Mapper.run(Mapper.Java:144) at org.Apache.hadoop.mapred.MapTask.runNewMapper(MapTask.Java:764) at org.Apache.hadoop.mapred.MapTask.run(MapTask.Java:370) at org.Apache.hadoop.mapred.Child$4.run(Child.Java:255) at Java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.Java:396) at org.Apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.Java:1121) at org.Apache.hadoop.mapred.Child.main(Child.Java:249)

クラスタイプは正しくマッピングされていると思いますクラスマッパー、

ここで私が間違っていることは何ですか？

Charles Menguy · Accepted Answer

M/Rプログラムでファイルを読み取る場合、マッパーの入力キーはファイル内の行のインデックスである必要がありますが、入力値は完全な行になります。

したがって、ここで起こっているのは、行インデックスを間違ったTextオブジェクトとして使用しようとしているため、代わりにLongWritableが必要であり、Hadoopが型について文句を言わないようにするためです。

代わりにこれを試してください：

public class TopKRecord extends Configured implements Tool { public static class MapClass extends Mapper<LongWritable, Text, Text, Text> { public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { // your map code goes here String[] fields = value.toString().split(","); String year = fields[1]; String claims = fields[8]; if (claims.length() > 0 && (!claims.startsWith("\""))) { context.write(new Text(year.toString()), new Text(claims.toString())); } } } ... }

また、コード内で再検討したいことの1つは、処理中のすべてのレコードに対して2つのTextオブジェクトを作成していることです。これらの2つのオブジェクトを最初に作成し、マッパーでsetメソッドを使用して値を設定するだけです。適切な量のデータを処理している場合、これにより時間を大幅に節約できます。

user3690041 · Answer

入力形式クラスを設定する必要があります

job.setInputFormatClass(KeyValueTextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class);