Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-8463

Revisit snapshot query planning performance regarding completion time

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 1.0.0
    • None
    • None
    • 20

    Description

      When the snapshot query is planned, there are cases to look up completion time based on instant time, which can be a performance bottleneck, especially there are huge number of files, and large number of instants to look up, in both archived and active timeline.  We should see if this can be improved by storing the completion time of each file in the FILES partition in the metadata table to avoid expensive lookup every time.  When the completion time of each file in the FILES partition is stored in MDT, we only need to do filtering based on the information from MDT only.

      Attachments

        Activity

          People

            yihua Y Ethan Guo
            yihua Y Ethan Guo
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 20h
                20h
                Remaining:
                Remaining Estimate - 20h
                20h
                Logged:
                Time Spent - Not Specified
                Not Specified