Details

    • Sub-task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.1.0
    • 2.1.0
    • SQL
    • None

    Description

      In Spark 2.1 ListingFileCatalog was significantly refactored (and renamed to InMemoryFileIndex).

      It seems there is a performance regression here where we no longer performance listing in parallel for the non-root directory. This forces file listing to be completely serial when resolving datasource tables that are not backed by an external catalog.

      Attachments

        Activity

          People

            ekhliang Eric Liang
            ekhliang Eric Liang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: