Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7086

Add config to allow FileInputFormat to ignore directories when recursive=false

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.2.0, 3.1.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We are trying to create a split in Hive that will only read files in a directory and not subdirectories.
      That fails with the below error.
      Given how this error comes about (two pieces of code interact, one explicitly adding directories to results without failing, and one failing on any directories in results), this seems like a bug.

      Caused by: java.io.IOException: Not a file: file:/,...warehouse/simple_to_mm_text/delta_0000001_0000001_0000
      	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
      	at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
      	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
      	at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
      

      This code, when recursion is disabled, adds directories to results

       
      if (recursive && stat.isDirectory()) {
                    result.dirsNeedingRecursiveCalls.add(stat);
                  } else {
                    result.locatedFileStatuses.add(stat);
                  }
      

      However the getSplits code after that computes the size like this

      long totalSize = 0;                           // compute total size
          for (FileStatus file: files) {                // check we have valid files
            if (file.isDirectory()) {
              throw new IOException("Not a file: "+ file.getPath());
            }
            totalSize +=
      

      which would always fail combined with the above code.

        Attachments

        1. HADOOP-15403.patch
          2 kB
          Sergey Shelukhin
        2. MAPREDUCE-7086.01.patch
          7 kB
          Sergey Shelukhin
        3. MAPREDUCE-7086.patch
          2 kB
          Sergey Shelukhin

          Issue Links

            Activity

              People

              • Assignee:
                sershe Sergey Shelukhin
                Reporter:
                sershe Sergey Shelukhin
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: