Pig
  1. Pig
  2. PIG-2540

AvroStorage can't read schema on amazon s3 in elastic mapreduce

    Details

    • Patch Info:
      Patch Available
    • Hadoop Flags:
      Reviewed

      Description

      grunt> emails = load 's3://agile.data/again_inbox' using AvroStorage();
      grunt> describe emails
      Schema for emails unknown.
      grunt> a = limit emails 10;
      grunt> dump a
      2012-02-16 22:15:58,347 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: LIMIT
      2012-02-16 22:15:58,483 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
      2012-02-16 22:15:58,542 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
      2012-02-16 22:15:58,542 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
      2012-02-16 22:15:58,632 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
      2012-02-16 22:15:58,658 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
      2012-02-16 22:15:58,665 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration.
      2012-02-16 22:15:58,665 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias a
      at org.apache.pig.PigServer.openIterator(PigServer.java:901)
      at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:652)
      at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
      at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
      at org.apache.pig.Main.run(Main.java:497)
      at org.apache.pig.Main.main(Main.java:111)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
      Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias a
      at org.apache.pig.PigServer.storeEx(PigServer.java:1000)
      at org.apache.pig.PigServer.store(PigServer.java:963)
      at org.apache.pig.PigServer.openIterator(PigServer.java:876)
      ... 12 more
      Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:731)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:263)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:149)
      at org.apache.pig.PigServer.launchPlan(PigServer.java:1314)
      at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299)
      at org.apache.pig.PigServer.storeEx(PigServer.java:996)
      ... 14 more
      Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
      at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:352)
      at org.apache.pig.piggybank.storage.avro.AvroStorage.setLocation(AvroStorage.java:138)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:387)
      ... 19 more

      1. PIG-2540_4.patch
        5 kB
        Jonathan Coveney
      2. PIG-2540_almost_there.patch
        5 kB
        Jonathan Coveney
      3. PIG-2540.tests_fail.patch
        6 kB
        Russell Jurney
      4. PIG-2540.tests_fail.patch.2
        7 kB
        Russell Jurney
      5. TEST-org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.txt
        84 kB
        Russell Jurney

        Activity

        Hide
        Russell Jurney added a comment -

        Note: this patch seems to have accidentally fixed PIG-2527

        Rejoice!

        Show
        Russell Jurney added a comment - Note: this patch seems to have accidentally fixed PIG-2527 Rejoice!
        Hide
        Russell Jurney added a comment -

        Oh wow!

        Show
        Russell Jurney added a comment - Oh wow!
        Hide
        Jonathan Coveney added a comment -

        Applied to pig0.9 r1305717

        Onward and upwards!

        Show
        Jonathan Coveney added a comment - Applied to pig0.9 r1305717 Onward and upwards!
        Hide
        Jonathan Coveney added a comment -

        Attaching applied patch.

        Show
        Jonathan Coveney added a comment - Attaching applied patch.
        Hide
        Jonathan Coveney added a comment -

        Committed to 0.10 (r1305716) and trunk (r1305715). Testing on 0.9 now.

        Show
        Jonathan Coveney added a comment - Committed to 0.10 (r1305716) and trunk (r1305715). Testing on 0.9 now.
        Hide
        Russell Jurney added a comment -

        Can anyone help me out here? I don't know how to patch other than git, and it is not working

        Show
        Russell Jurney added a comment - Can anyone help me out here? I don't know how to patch other than git, and it is not working
        Hide
        Russell Jurney added a comment -

        I don't understand how to do this. I did this:

        russell-jurneys-macbook-pro:newpig rjurney$ git remote -v
        origin https://github.com/apache/pig.git (fetch)
        origin https://github.com/apache/pig.git (push)

        russell-jurneys-macbook-pro:newpig rjurney$ git branch -v

        • branch-0.10 14f4606 [ahead 5] Merge branch 'branch-0.10' of https://github.com/apache/pig into branch-0.10
          trunk cb49401 [behind 7] PIG-2589: Additional e2e test for 0.10 new features

        russell-jurneys-macbook-pro:newpig rjurney$ git pull
        remote: Counting objects: 77, done.
        remote: Compressing objects: 100% (6/6), done.
        remote: Total 39 (delta 17), reused 39 (delta 17)
        Unpacking objects: 100% (39/39), done.
        From https://github.com/apache/pig
        b8ce196..d1f6cb1 branch-0.10 -> origin/branch-0.10
        8b21cc4..841f336 trunk -> origin/trunk
        Merge made by recursive.

        git diff --no-prefix 73bb67f8cc3974d76e034d09da96995e887b4c30 contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/ > PIG-2540.tests_fail.patch.3

        This patch is identical to the other one. What am I missing?

        Show
        Russell Jurney added a comment - I don't understand how to do this. I did this: russell-jurneys-macbook-pro:newpig rjurney$ git remote -v origin https://github.com/apache/pig.git (fetch) origin https://github.com/apache/pig.git (push) russell-jurneys-macbook-pro:newpig rjurney$ git branch -v branch-0.10 14f4606 [ahead 5] Merge branch 'branch-0.10' of https://github.com/apache/pig into branch-0.10 trunk cb49401 [behind 7] PIG-2589 : Additional e2e test for 0.10 new features russell-jurneys-macbook-pro:newpig rjurney$ git pull remote: Counting objects: 77, done. remote: Compressing objects: 100% (6/6), done. remote: Total 39 (delta 17), reused 39 (delta 17) Unpacking objects: 100% (39/39), done. From https://github.com/apache/pig b8ce196..d1f6cb1 branch-0.10 -> origin/branch-0.10 8b21cc4..841f336 trunk -> origin/trunk Merge made by recursive. git diff --no-prefix 73bb67f8cc3974d76e034d09da96995e887b4c30 contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/ > PIG-2540 .tests_fail.patch.3 This patch is identical to the other one. What am I missing?
        Hide
        Russell Jurney added a comment -

        I am using git. I commit locally, and pull from the github copy... so I am not sure how to do this?

        Show
        Russell Jurney added a comment - I am using git. I commit locally, and pull from the github copy... so I am not sure how to do this?
        Hide
        Jonathan Coveney added a comment -

        Russell, I think you need to rebase. That has some changes to PigAvroInputFormat.java that I think are from a different patch, and it doesn't have the change I made to fix the tests.

        Show
        Jonathan Coveney added a comment - Russell, I think you need to rebase. That has some changes to PigAvroInputFormat.java that I think are from a different patch, and it doesn't have the change I made to fix the tests.
        Hide
        Russell Jurney added a comment -

        printlns removed

        Show
        Russell Jurney added a comment - printlns removed
        Hide
        Jonathan Coveney added a comment -

        Favor 1: can you try this on trunk and see if it works? No reason not to have it in both.
        Point 2: in getAllSubDirs, you added a bunch of System.err.println. This should all be logged instead, via the logger, though some of it looks like vestiges of when you were debugging either way, could be useful to clean it up and log some useful statz. Otherwise, looks good to me.

        Show
        Jonathan Coveney added a comment - Favor 1: can you try this on trunk and see if it works? No reason not to have it in both. Point 2: in getAllSubDirs, you added a bunch of System.err.println. This should all be logged instead, via the logger, though some of it looks like vestiges of when you were debugging either way, could be useful to clean it up and log some useful statz. Otherwise, looks good to me.
        Hide
        Russell Jurney added a comment -

        The fix for test_no_extension was merged in PIG-2505.

        So this is ready for merging?

        Show
        Russell Jurney added a comment - The fix for test_no_extension was merged in PIG-2505 . So this is ready for merging?
        Hide
        Jonathan Coveney added a comment -

        Russell, here is the change. Previously, there was this function in TestAvroStorage:

            private static String getInputFile(String file) {
                return "file:///" + System.getProperty("user.dir") + "/" + basedir + file;
            }   
        

        The issue is that System.getProperty("user.dir") returns a directory that begins with a /, so you were getting

        file:////etc
        

        By changing it accordingly, now all but one test run. The last test errors out, but this is because the file "test_no_extension" doesn't exist.

        Show
        Jonathan Coveney added a comment - Russell, here is the change. Previously, there was this function in TestAvroStorage: private static String getInputFile( String file) { return "file: ///" + System .getProperty( "user.dir" ) + "/" + basedir + file; } The issue is that System.getProperty("user.dir") returns a directory that begins with a /, so you were getting file: ////etc By changing it accordingly, now all but one test run. The last test errors out, but this is because the file "test_no_extension" doesn't exist.
        Hide
        Russell Jurney added a comment -

        These are the test failures I get.

        Show
        Russell Jurney added a comment - These are the test failures I get.
        Hide
        Russell Jurney added a comment -

        This fails 7 out of 11 tests, but works in practice. Help.

        Show
        Russell Jurney added a comment - This fails 7 out of 11 tests, but works in practice. Help.
        Hide
        Russell Jurney added a comment -

        This patch converts the API used to get the current directory from a Path to a URI. This enables S3, file:// and /foo all work.

        The outstanding issue is that 7 out of 11 tests fail. I cannot reproduce these failures outside of the unit tests.

        I call for help.

        Show
        Russell Jurney added a comment - This patch converts the API used to get the current directory from a Path to a URI. This enables S3, file:// and /foo all work. The outstanding issue is that 7 out of 11 tests fail. I cannot reproduce these failures outside of the unit tests. I call for help.

          People

          • Assignee:
            Russell Jurney
            Reporter:
            Russell Jurney
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development