Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6800

FileInputFormat.singleThreadedListStatus to use listFiles(recursive)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.7.3
    • None
    • mrv2
    • None

    Description

      FileInputFormat.singleThreadedListStatus does recursive directory walks to pick files to scan. This is very inefficient on object stores, and can be bypassed if listFiles(recursive=true) can be used instead.

      Based on the experience of SPARK-2984, it should also be resilient to a source file going away during the iteration, downgrading an FNFE to a "skip that nonexistent path"

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stevel@apache.org Steve Loughran
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: