[SPARK-18273] DataFrameReader.load takes a lot of time to start the job if a lot of file/dir paths are pass - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Not A Problem
Affects Version/s: 2.0.1
Fix Version/s: None
Component/s: Spark Core
Labels:
None

Description

If the paths Seq parameter contains a lot of elements, then DataFrameReader.load takes a lot of time starting the job as it attempts to check if each of the path exists using fs.exists. There should be a boolean configuration option to disable the checking for path's existence and that should be passed in as parameter to DataSource.resolveRelation call.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Aniket Bhatnagar

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 04/Nov/16 15:05

Updated:: 04/Nov/16 21:23

Resolved:: 04/Nov/16 21:23