Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7086

Add config to allow FileInputFormat to ignore directories when recursive=false

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.2.0, 3.1.1
    • None
    • None
    • Reviewed

    Description

      We are trying to create a split in Hive that will only read files in a directory and not subdirectories.
      That fails with the below error.
      Given how this error comes about (two pieces of code interact, one explicitly adding directories to results without failing, and one failing on any directories in results), this seems like a bug.

      Caused by: java.io.IOException: Not a file: file:/,...warehouse/simple_to_mm_text/delta_0000001_0000001_0000
      	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
      	at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
      	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
      	at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
      

      This code, when recursion is disabled, adds directories to results

       
      if (recursive && stat.isDirectory()) {
                    result.dirsNeedingRecursiveCalls.add(stat);
                  } else {
                    result.locatedFileStatuses.add(stat);
                  }
      

      However the getSplits code after that computes the size like this

      long totalSize = 0;                           // compute total size
          for (FileStatus file: files) {                // check we have valid files
            if (file.isDirectory()) {
              throw new IOException("Not a file: "+ file.getPath());
            }
            totalSize +=
      

      which would always fail combined with the above code.

      Attachments

        1. MAPREDUCE-7086.patch
          2 kB
          Sergey Shelukhin
        2. MAPREDUCE-7086.01.patch
          7 kB
          Sergey Shelukhin
        3. HADOOP-15403.patch
          2 kB
          Sergey Shelukhin

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sershe Sergey Shelukhin
            sershe Sergey Shelukhin
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment