Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3830

HiveColumnarLoader throwing FileNotFoundException on Hadoop 2

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.12.0
    • 0.18.0
    • None
    • None

    Description

      I've noticed that HiveColumnarLoader will thrown java.io.FileNotFoundException when used with glob path on Hadoop 2.0. It will run just fine on Hadoop 1.0:

      Failed to parse: java.io.FileNotFoundException: File /home/jarcec/cloudera/repos/pig/contrib/piggybank/java/simpleDataDir1395623312698/*.txt does not exist
      	at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:198)
      	at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1676)
      	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1623)
      	at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
      	at org.apache.pig.PigServer.registerQuery(PigServer.java:588)
      	at org.apache.pig.piggybank.test.storage.TestHiveColumnarLoader.testHdfdsGlobbing(TestHiveColumnarLoader.java:220)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:601)
      	at junit.framework.TestCase.runTest(TestCase.java:176)
      	at junit.framework.TestCase.runBare(TestCase.java:141)
      	at junit.framework.TestResult$1.protect(TestResult.java:122)
      	at junit.framework.TestResult.runProtected(TestResult.java:142)
      	at junit.framework.TestResult.run(TestResult.java:125)
      	at junit.framework.TestCase.run(TestCase.java:129)
      	at junit.framework.TestSuite.runTest(TestSuite.java:255)
      	at junit.framework.TestSuite.run(TestSuite.java:250)
      	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
      	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
      	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)
      Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File /home/jarcec/cloudera/repos/pig/contrib/piggybank/java/simpleDataDir1395623312698/*.txt does not exist
      	at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:362)
      	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1484)
      	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1524)
      	at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:564)
      	at org.apache.pig.piggybank.storage.partition.PathPartitioner.getPartitionKeys(PathPartitioner.java:105)
      	at org.apache.pig.piggybank.storage.partition.PathPartitionHelper.getPartitionKeys(PathPartitionHelper.java:101)
      	at org.apache.pig.piggybank.storage.HiveColumnarLoader.getPartitionColumns(HiveColumnarLoader.java:576)
      	at org.apache.pig.piggybank.storage.HiveColumnarLoader.getSchema(HiveColumnarLoader.java:646)
      	at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175)
      	at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
      	at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:853)
      	at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3479)
      	at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1536)
      	at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1013)
      	at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:553)
      	at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
      	at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
      	... 20 more
      Caused by: java.io.FileNotFoundException: File /home/jarcec/cloudera/repos/pig/contrib/piggybank/java/simpleDataDir1395623312698/*.txt does not exist
      	... 37 more
      

      I've dived into the problem and found a difference in Hadoop implementation of DistributedFileSystem. For non existing directory method listStatus will return null in Hadoop 1:

          if (thisListing == null) { // the directory does not exist
            return null;
          }
      

      But will thrown an exception in Hadoop 2:

          if (thisListing == null) { // the directory does not exist
            throw new FileNotFoundException("File " + p + " does not exist.");
          }
      

      Attachments

        1. PIG-3830.patch
          7 kB
          Jarek Jarcec Cecho

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jarcec Jarek Jarcec Cecho
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: