Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Not A Problem
-
2.4.3
-
None
-
None
-
mongo-hadoop: 2.0.2
spark-version: 2.4.3
scala-version: 2.11
hive-version: 1.2.1
hadoop-version: 2.6.0
Description
I execute the sql,but i got a NPE.
result_data_mongo is a mongodb hive external table.
insert into result_data_mongo values("1111111115","1111111115","1111111115",1111111115,"1111111115",1111111115,1111111115,1111111115,1111111115,1111111115,1111111115,1111111115,1111111115,1111111115,1111111115);
NPE detail:
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249) at org.apache.spark.sql.hive.execution.HiveOutputWriter.<init>(HiveFileFormat.scala:123) at org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:103) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:120) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:108) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:236) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at com.mongodb.hadoop.output.MongoOutputCommitter.getTaskAttemptPath(MongoOutputCommitter.java:264) at com.mongodb.hadoop.output.MongoRecordWriter.<init>(MongoRecordWriter.java:59) at com.mongodb.hadoop.hive.output.HiveMongoOutputFormat$HiveMongoRecordWriter.<init>(HiveMongoOutputFormat.java:80) at com.mongodb.hadoop.hive.output.HiveMongoOutputFormat.getHiveRecordWriter(HiveMongoOutputFormat.java:52) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246) ... 15 more
I know mongo-hadoop use the incorrect key to get TaskAttemptID,so I modified the source code of mongo-hadoop to get the correct properties ('mapreduce.task.id' and 'mapreduce.task.attempt.id'), but I still can't get the value. I found that these parameters are stored in spark In TaskAttemptContext, but TaskAttemptContext is not passed into HiveOutputWriter, is this a design flaw?
here are two key point.