Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2489

Input Path Globbing{} not working with PigStorageSchema or PigStorage('\t', '-schema');

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0, 0.9.1, 0.10.0
    • Fix Version/s: 0.10.0, 0.11
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      test.pig
      -- For Pig 0.9
      --A = LOAD 'input/PigStorageSchema/Temp{1,2}/pss*' USING org.apache.pig.piggybank.storage.PigStorageSchema();
      -- For Pig 0.10
      A = LOAD 'input/PigStorageSchema/Temp{1,2}/pss*' USING PigStorage('\t', '-schema');
      
      DESCRIBE A;
      
      DUMP A
      

      Schema file _input/PigStorageSchema/Temp

      {1,2}.pig_schema_
      {"fields":[{"name":"name","type":55,"schema":null,"description":"autogenerated from Pig Field Schema"},{"name":"val","type":10,"schema":null,"description":"autogenerated from Pig Field Schema"}],"version":0,"sortKeys":[],"sortKeyOrders":[]}
      


      Header file _input/PigStorageSchema/Temp{1,2}

      /.pig_header_

      name    val
      

      Sample input file input/PigStorageSchema/Temp1/pss.in

      peter   1
      samir   3
      michael 4
      peter   2
      peter   4
      samir   1
      

      On running the above pig script test.pig with pig 0.10, the following error is received.

      012-01-24 04:07:42,210 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1131: Could not find schema file for hdfs://nameNode:8020/user/mitesh/input/PigStorageSchema/Temp{1,2}/pss*
      

      Pig Stack Trace

      Pig Stack Trace
      ---------------
      ERROR 1131: Could not find schema file for hdfs://nameNode:8020/user/mitesh/input/PigStorageSchema/Temp{1,2}/pss*
      
      org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A
          at org.apache.pig.PigServer.openIterator(PigServer.java:858)
          at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:655)
          at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
          at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
          at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
          at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
          at org.apache.pig.Main.run(Main.java:567)
          at org.apache.pig.Main.main(Main.java:111)
      Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias A
          at org.apache.pig.PigServer.storeEx(PigServer.java:957)
          at org.apache.pig.PigServer.store(PigServer.java:920)
          at org.apache.pig.PigServer.openIterator(PigServer.java:833)
          ... 7 more
      Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2245: 
      <file test.pig, line 2, column 4> Cannot get schema from loadFunc org.apache.pig.builtin.PigStorage
          at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:154)
          at org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
          at org.apache.pig.newplan.logical.relational.LOStore.getSchema(LOStore.java:68)
          at org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:60)
          at org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:84)
          at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
          at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
          at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
          at org.apache.pig.PigServer$Graph.compile(PigServer.java:1618)
          at org.apache.pig.PigServer$Graph.compile(PigServer.java:1612)
          at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1335)
          at org.apache.pig.PigServer.storeEx(PigServer.java:952)
          ... 9 more
      Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1131: Could not find schema file for hdfs://nameNode:8020/user/mitesh/input/PigStorageSchema/Temp{1,2}/pss*
          at org.apache.pig.builtin.JsonMetadata.nullOrException(JsonMetadata.java:222)
          at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:191)
          at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:438)
          at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
      

      Whereas PigStorageSchema() or PigStorage('\t', '-schema') works with wildcard *.
      For example, following script works

      test2.pig
      A = LOAD 'input/PigStorageSchema/Temp*/pss*' USING PigStorage('\t', '-schema');
      
      DESCRIBE A;
      
      DUMP A;
      

      As a workaround to make Temp

      {1,2}

      globbing work, the ,(comma) separated multiple input paths (with no globbing)
      can given as input.

      test2.pig
      A = LOAD 'input/PigStorageSchema/Temp1/pss*,input/PigStorageSchema/Temp2/pss*' USING PigStorage('\t', '-schema');
      
      DESCRIBE A;
      
      DUMP A;
      

        Attachments

        1. PIG-2489-1.patch
          2 kB
          Daniel Dai

          Issue Links

            Activity

              People

              • Assignee:
                daijy Daniel Dai
                Reporter:
                miteshsjat Mitesh Singh Jat
              • Votes:
                1 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: