Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.5.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      For large Parquet tables (e.g., with thousands of partitions), it can be very slow to discover Parquet metadata for schema merging and generating splits for Spark jobs. We need to accelerate this processes. One possible solution is to do the discovery via a distributed Spark job.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                lian cheng Cheng Lian
                Reporter:
                lian cheng Cheng Lian
                Shepherd:
                Cheng Lian
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: