Details

    • Tags:
      pig avro storage

      Description

      I am creating Avro records according to the instructions/code at https://github.com/rjurney/Collecting-Data They look like this:

      {
      "namespace": "agile.data.avro",
      "name": "Email",
      "type": "record",
      "fields": [

      {"name":"message_id", "type": ["string", "null"]}

      ,

      {"name":"from","type": ["string", "null"]}

      ,
      {"name":"to","type": [

      {"type":"array", "items":"string"}

      , "null"]},
      {"name":"cc","type": [

      {"type":"array", "items":"string"}

      , "null"]},
      {"name":"bcc","type": [

      {"type":"array", "items":"string"}

      , "null"]},
      {"name":"reply_to", "type": [

      {"type":"array", "items":"string"}

      , "null"]},

      {"name":"subject", "type": ["string", "null"]}

      ,

      {"name":"body", "type": ["string", "null"]}

      ,

      {"name":"date", "type": ["string", "null"]}

      ]
      }

      I have applied the patch at PIG-2411 to get Pig to store bags in Avro arrays. I am running pig in local mode via: pig -l /tmp -x local -v

      The script is:

      REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
      REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
      REGISTER /me/pig/contrib/piggybank/java/piggybank.jar
      REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
      REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar

      DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();

      messages = LOAD '/tmp/10000_emails.avro' USING AvroStorage();
      smaller = FOREACH messages GENERATE from, to;
      pairs = FOREACH smaller GENERATE from, FLATTEN(smaller.to) AS to;

      STORE pairs INTO '/tmp/mail_pairs.avro' USING AvroStorage();

      2011-12-20 17:58:25,705 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      2011-12-20 17:58:25,719 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      2011-12-20 17:58:25,722 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      2011-12-20 17:58:25,737 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      2011-12-20 17:58:25,740 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      2011-12-20 17:58:25,751 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      2011-12-20 17:58:25,755 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      2011-12-20 17:58:25,757 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      2011-12-20 17:58:25,760 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      2011-12-20 17:58:25,762 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      2011-12-20 17:58:25,766 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
      2011-12-20 17:58:25,804 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      2011-12-20 17:58:25,808 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      2011-12-20 17:58:25,810 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
      2011-12-20 17:58:25,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
      2011-12-20 17:58:25,813 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees.
      2011-12-20 17:58:25,813 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 out of total 3 MR operators.
      2011-12-20 17:58:25,813 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2
      2011-12-20 17:58:25,813 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      2011-12-20 17:58:25,817 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      2011-12-20 17:58:25,817 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
      2011-12-20 17:58:25,818 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
      2011-12-20 17:58:25,822 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up multi store job
      2011-12-20 17:58:25,826 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      2011-12-20 17:58:25,826 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
      2011-12-20 17:58:25,930 [Thread-22] WARN org.apache.hadoop.mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
      2011-12-20 17:58:26,327 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
      2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2117: Unexpected error when launching map reduce job.
      2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias pairs
      at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1553)
      at org.apache.pig.PigServer.registerQuery(PigServer.java:541)
      at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:943)
      at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
      at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
      at org.apache.pig.Main.run(Main.java:523)
      at org.apache.pig.Main.main(Main.java:148)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
      Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected error when launching map reduce job.
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:311)
      at org.apache.pig.PigServer.launchPlan(PigServer.java:1271)
      at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1256)
      at org.apache.pig.PigServer.execute(PigServer.java:1246)
      at org.apache.pig.PigServer.access$400(PigServer.java:127)
      at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1548)
      ... 13 more
      Caused by: java.lang.NullPointerException
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:193)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
      at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
      at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
      at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
      at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
      at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)

      1. test.pig
        0.6 kB
        Russell Jurney

        Activity

          People

          • Assignee:
            Unassigned
            Reporter:
            Russell Jurney
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development