Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5288

Optimize drop duplicates by avoiding index look up twice

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Invalid
    • None
    • 0.13.0
    • index
    • 1

    Description

      we could potentially optimize DROP_DUPES feature by doing just 1 index look up. 

      as of now, we do an explicit index look up and then drop the matched records and proceed on as a regular commit. 

      why not proceed on as regular commit and then just after index lookup, we can filter for new inserts and proceed. 

       

      Attachments

        Issue Links

          Activity

            People

              shivnarayan sivabalan narayanan
              shivnarayan sivabalan narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: