Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-4812

Lazy partition listing and file groups fetching in Spark Query

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Blocker
    • Resolution: Done
    • None
    • 0.13.0
    • spark

    Description

      In current spark query implementation, the FileIndex will refresh and load all file groups in cached in order to serve subsequent queries.

       

      For large table with many partitions, this may introduce much overhead in initialization. Meanwhile, the query itself may come with partition filter. So the loading of file groups will be unnecessary.

       

      So to optimize, the whole refresh logic will become lazy, where actual work will be carried out only after the partition filter.

      Attachments

        Issue Links

          Activity

            People

              yuweixiao Yuwei Xiao
              yuweixiao Yuwei Xiao
              Alexey Kudinkin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: