Details
Description
I am creating Avro records according to the instructions/code at https://github.com/rjurney/Collecting-Data They look like this:
{
"namespace": "agile.data.avro",
"name": "Email",
"type": "record",
"fields": [
,
,
{"name":"to","type": [
, "null"]},
{"name":"cc","type": [
, "null"]},
{"name":"bcc","type": [
, "null"]},
{"name":"reply_to", "type": [
, "null"]},
,
,
{"name":"date", "type": ["string", "null"]} ]
}
I have applied the patch at PIG-2411 to get Pig to store bags in Avro arrays. I am running pig in local mode via: pig -l /tmp -x local -v
The script is:
REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
REGISTER /me/pig/contrib/piggybank/java/piggybank.jar
REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar
DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
messages = LOAD '/tmp/10000_emails.avro' USING AvroStorage();
smaller = FOREACH messages GENERATE from, to;
pairs = FOREACH smaller GENERATE from, FLATTEN(smaller.to) AS to;
STORE pairs INTO '/tmp/mail_pairs.avro' USING AvroStorage();
2011-12-20 17:58:25,705 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-12-20 17:58:25,719 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-12-20 17:58:25,722 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-12-20 17:58:25,737 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-12-20 17:58:25,740 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-12-20 17:58:25,751 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-12-20 17:58:25,755 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-12-20 17:58:25,757 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-12-20 17:58:25,760 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-12-20 17:58:25,762 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-12-20 17:58:25,766 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2011-12-20 17:58:25,804 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-12-20 17:58:25,808 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-12-20 17:58:25,810 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2011-12-20 17:58:25,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2011-12-20 17:58:25,813 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees.
2011-12-20 17:58:25,813 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 out of total 3 MR operators.
2011-12-20 17:58:25,813 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2
2011-12-20 17:58:25,813 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-12-20 17:58:25,817 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-12-20 17:58:25,817 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2011-12-20 17:58:25,818 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-12-20 17:58:25,822 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up multi store job
2011-12-20 17:58:25,826 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2011-12-20 17:58:25,826 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2011-12-20 17:58:25,930 [Thread-22] WARN org.apache.hadoop.mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
2011-12-20 17:58:26,327 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2117: Unexpected error when launching map reduce job.
2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias pairs
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1553)
at org.apache.pig.PigServer.registerQuery(PigServer.java:541)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:943)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:523)
at org.apache.pig.Main.main(Main.java:148)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected error when launching map reduce job.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:311)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1271)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1256)
at org.apache.pig.PigServer.execute(PigServer.java:1246)
at org.apache.pig.PigServer.access$400(PigServer.java:127)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1548)
... 13 more
Caused by: java.lang.NullPointerException
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:193)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)