[SPARK-18679] Regression in file listing performance - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 2.1.0
Fix Version/s: 2.1.0
Component/s: SQL
Labels:
None

Target Version/s:

2.1.0

Description

In Spark 2.1 ListingFileCatalog was significantly refactored (and renamed to InMemoryFileIndex).

It seems there is a performance regression here where we no longer performance listing in parallel for the non-root directory. This forces file listing to be completely serial when resolving datasource tables that are not backed by an external catalog.

Attachments

Issue Links

links to

[Github] Pull Request #16112 (ericl)

Activity

People

Assignee:: Eric Liang

Reporter:: Eric Liang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 01/Dec/16 23:30

Updated:: 02/Dec/16 13:03

Resolved:: 02/Dec/16 13:03