Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18273

DataFrameReader.load takes a lot of time to start the job if a lot of file/dir paths are pass

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Not A Problem
    • 2.0.1
    • None
    • Spark Core
    • None

    Description

      If the paths Seq parameter contains a lot of elements, then DataFrameReader.load takes a lot of time starting the job as it attempts to check if each of the path exists using fs.exists. There should be a boolean configuration option to disable the checking for path's existence and that should be passed in as parameter to DataSource.resolveRelation call.

      Attachments

        Activity

          People

            Unassigned Unassigned
            aniket Aniket Bhatnagar
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: