Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.10.0, 0.11, 0.10.1
-
Pig with Wonderdog https://github.com/infochimps-labs/wonderdog for elasticsearch integration. Elasticsearch 0.18.6. Pig local mode.
Description
The Pig UDFs in Wonderdog for ElasticSearch integration, which worked in 0.9.2 stopped working in 0.10.0.
Now in 0.10.0 there is an error, as Wonderdog is unable to read its configuration from the hadoop cache.
If someone can help identify what the issue is, or advise how Wonderdog or Pig can be modified so that wonderdog works with with Pig 0.10, it would be greatly appreciated.
This issue is duped in the Wonderdog project here: https://github.com/infochimps-labs/wonderdog/issues/6 https://github.com/infochimps-labs/wonderdog/issues/5 and https://github.com/infochimps-labs/wonderdog/issues/7
The error is below:
2012-07-06 16:50:51,501 [main] INFO org.apache.pig.Main - Apache Pig version 0.10.0-SNAPSHOT (rexported) compiled Jun 22 2012, 15:56:16
2012-07-06 16:50:51,502 [main] INFO org.apache.pig.Main - Logging error messages to: /private/tmp/pig_1341618651472.log
2012-07-06 16:50:51,829 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 -::- -::- -::- 0
100 11 100 11 0 0 647 0 -::- -::- -::- 733
2012-07-06 16:50:53,206 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2012-07-06 16:50:53,379 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 16:50:53,403 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 16:50:53,403 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 16:50:53,441 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 16:50:53,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 16:50:53,494 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-07-06 16:50:53,560 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-07-06 16:50:53,587 [Thread-7] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2012-07-06 16:50:53,597 [Thread-7] WARN org.apache.hadoop.mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
****file:/tmp/emails.json
2012-07-06 16:50:53,711 [Thread-7] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
2012-07-06 16:50:53,711 [Thread-7] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 2
2012-07-06 16:50:53,734 [Thread-7] WARN org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
2012-07-06 16:50:53,737 [Thread-7] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 3
2012-07-06 16:50:54,008 [Thread-8] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : null
2012-07-06 16:50:54,023 [Thread-8] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/tmp/emails.json/part-m-00000:0+33554432
2012-07-06 16:50:54,029 [Thread-8] INFO com.infochimps.elasticsearch.ElasticSearchOutputFormat - Using field:[message_id] for document ids
2012-07-06 16:50:54,029 [Thread-8] INFO com.infochimps.elasticsearch.ElasticSearchOutputFormat - Using [null] as es.config
2012-07-06 16:50:54,029 [Thread-8] INFO com.infochimps.elasticsearch.ElasticSearchOutputFormat - Using [null] as es.plugins.dir
2012-07-06 16:50:54,033 [Thread-8] WARN org.apache.hadoop.mapred.FileOutputCommitter - Output path is null in cleanup
2012-07-06 16:50:54,034 [Thread-8] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: java.lang.NullPointerException
at com.infochimps.elasticsearch.ElasticSearchOutputFormat$ElasticSearchRecordWriter.<init>(ElasticSearchOutputFormat.java:133)
at com.infochimps.elasticsearch.ElasticSearchOutputFormat.getRecordWriter(ElasticSearchOutputFormat.java:262)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:84)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:628)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:753)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: java.lang.NullPointerException
at java.util.Hashtable.put(Hashtable.java:394)
at java.util.Properties.setProperty(Properties.java:143)
at java.lang.System.setProperty(System.java:746)
at com.infochimps.elasticsearch.ElasticSearchOutputFormat$ElasticSearchRecordWriter.<init>(ElasticSearchOutputFormat.java:130)
... 6 more
2012-07-06 16:50:54,506 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
2012-07-06 16:50:54,506 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-07-06 16:50:59,022 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0001 has failed! Stop running all dependent jobs
2012-07-06 16:50:59,023 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-07-06 16:50:59,024 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2012-07-06 16:50:59,024 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats reported below may be incomplete
2012-07-06 16:50:59,025 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.2 0.10.0-SNAPSHOT rjurney 2012-07-06 16:50:53 2012-07-06 16:50:59 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local_0001 json_emails MAP_ONLY Message: Job failed! Error - NA es://email/email?id=message_id&json=true&size=1000,
Input(s):
Failed to read data from "/tmp/emails.json"
Output(s):
Failed to produce result in "es://email/email?id=message_id&json=true&size=1000"
Job DAG:
job_local_0001
2012-07-06 16:50:59,025 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2012-07-06 16:50:59,029 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
2012-07-06 16:50:59,029 [main] ERROR org.apache.pig.tools.grunt.GruntParser - org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at org.apache.pig.tools.grunt.GruntParser.processShCommand(GruntParser.java:1025)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:167)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Details also at logfile: /private/tmp/pig_1341618651472.log
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
{
"took" : 75,
"timed_out" : false,
0 0 0 0 0 0 0 0 -::- -::- -::- 0 "_shards" :
,
"hits" :
}
100 193 100 193 0 0 2475 0 -::- -::- -::- 2539
2012-07-06 16:50:59,140 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
2012-07-06 16:50:59,140 [main] ERROR org.apache.pig.tools.grunt.GruntParser - org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)