Pig
  1. Pig
  2. PIG-3830

HiveColumnarLoader throwing FileNotFoundException on Hadoop 2

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.15.0
    • Component/s: None
    • Labels:
      None

      Description

      I've noticed that HiveColumnarLoader will thrown java.io.FileNotFoundException when used with glob path on Hadoop 2.0. It will run just fine on Hadoop 1.0:

      Failed to parse: java.io.FileNotFoundException: File /home/jarcec/cloudera/repos/pig/contrib/piggybank/java/simpleDataDir1395623312698/*.txt does not exist
      	at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:198)
      	at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1676)
      	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1623)
      	at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
      	at org.apache.pig.PigServer.registerQuery(PigServer.java:588)
      	at org.apache.pig.piggybank.test.storage.TestHiveColumnarLoader.testHdfdsGlobbing(TestHiveColumnarLoader.java:220)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:601)
      	at junit.framework.TestCase.runTest(TestCase.java:176)
      	at junit.framework.TestCase.runBare(TestCase.java:141)
      	at junit.framework.TestResult$1.protect(TestResult.java:122)
      	at junit.framework.TestResult.runProtected(TestResult.java:142)
      	at junit.framework.TestResult.run(TestResult.java:125)
      	at junit.framework.TestCase.run(TestCase.java:129)
      	at junit.framework.TestSuite.runTest(TestSuite.java:255)
      	at junit.framework.TestSuite.run(TestSuite.java:250)
      	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
      	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
      	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)
      Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File /home/jarcec/cloudera/repos/pig/contrib/piggybank/java/simpleDataDir1395623312698/*.txt does not exist
      	at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:362)
      	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1484)
      	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1524)
      	at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:564)
      	at org.apache.pig.piggybank.storage.partition.PathPartitioner.getPartitionKeys(PathPartitioner.java:105)
      	at org.apache.pig.piggybank.storage.partition.PathPartitionHelper.getPartitionKeys(PathPartitionHelper.java:101)
      	at org.apache.pig.piggybank.storage.HiveColumnarLoader.getPartitionColumns(HiveColumnarLoader.java:576)
      	at org.apache.pig.piggybank.storage.HiveColumnarLoader.getSchema(HiveColumnarLoader.java:646)
      	at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175)
      	at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
      	at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:853)
      	at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3479)
      	at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1536)
      	at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1013)
      	at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:553)
      	at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
      	at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
      	... 20 more
      Caused by: java.io.FileNotFoundException: File /home/jarcec/cloudera/repos/pig/contrib/piggybank/java/simpleDataDir1395623312698/*.txt does not exist
      	... 37 more
      

      I've dived into the problem and found a difference in Hadoop implementation of DistributedFileSystem. For non existing directory method listStatus will return null in Hadoop 1:

          if (thisListing == null) { // the directory does not exist
            return null;
          }
      

      But will thrown an exception in Hadoop 2:

          if (thisListing == null) { // the directory does not exist
            throw new FileNotFoundException("File " + p + " does not exist.");
          }
      
      1. PIG-3830.patch
        7 kB
        Jarek Jarcec Cecho

        Issue Links

          Activity

          Hide
          Jarek Jarcec Cecho added a comment -

          Attaching patch that:

          • Add catch block for the FileNotFound exception to gracefully handle the situation on Hadoop 2.0.
          • Fix formatting of the affected method getPartitionKeys. Entire file is not using the usual formatting guidelines that we are using in Pig, so I've fixed at least the method I was changing.
          • Added test.output variable to piggybank's build.xml file as it was missing
          Show
          Jarek Jarcec Cecho added a comment - Attaching patch that: Add catch block for the FileNotFound exception to gracefully handle the situation on Hadoop 2.0. Fix formatting of the affected method getPartitionKeys . Entire file is not using the usual formatting guidelines that we are using in Pig, so I've fixed at least the method I was changing. Added test.output variable to piggybank's build.xml file as it was missing
          Hide
          Cheolsoo Park added a comment -

          Jarek Jarcec Cecho, is this ready for review? If so, please mark it as patch available.

          Regarding the formatting, feel free to clean up the white spaces. As long as the patch is uploaded to the RB, it's not hard to review (at least for me).

          Show
          Cheolsoo Park added a comment - Jarek Jarcec Cecho , is this ready for review? If so, please mark it as patch available. Regarding the formatting, feel free to clean up the white spaces. As long as the patch is uploaded to the RB, it's not hard to review (at least for me).
          Hide
          Jarek Jarcec Cecho added a comment -

          I've forgot to switch the status to "Patch available". Thank you for catching it, Cheolsoo Park!

          Show
          Jarek Jarcec Cecho added a comment - I've forgot to switch the status to "Patch available". Thank you for catching it, Cheolsoo Park !
          Hide
          Cheolsoo Park added a comment -

          Jarek Jarcec Cecho, thank you for the patch. Please help me understand this.

          Is failing loudly for a non-existent path really bad? I suppose it may make sense for when the pattern doesn't match any paths. However, will it also make sense for when a wrong input path is given? In that case, won't the users want the job fail early so that they can fix the problem?

          Show
          Cheolsoo Park added a comment - Jarek Jarcec Cecho , thank you for the patch. Please help me understand this. Is failing loudly for a non-existent path really bad? I suppose it may make sense for when the pattern doesn't match any paths. However, will it also make sense for when a wrong input path is given? In that case, won't the users want the job fail early so that they can fix the problem?
          Hide
          Cheolsoo Park added a comment -

          Canceling patch while waiting for response.

          Show
          Cheolsoo Park added a comment - Canceling patch while waiting for response.

            People

            • Assignee:
              Jarek Jarcec Cecho
              Reporter:
              Jarek Jarcec Cecho
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development