Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-2683

Parallelize deleting archived hoodie commits

    XMLWordPrintableJSON

Details

    Description

      For now, hoodie will use 5s to delete 30 archived commits, even worse for bigger archive threshold like set archive.max_commits 100 or larger.

      This is because of hoodie deleting archived commits in driver serially.

      Sometimes, it is unacceptable for Spark Streaming jobs with second level batch interval.

      We need to delete archived commits in parallel.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zhangyue19921010 Yue Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: