Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-3178

Metadata table compaction can include invalid updates from failed actions on dataset

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 0.10.1
    • None

    Description

      Metadata Table v2 performs an inline compaction once a deltacommit has been written.

      Timeline:
      (on dataset) t1.commit.requested
      (on dataset) t1.commit.inflight
      ---- all parquet writes complete here, WriteStatus generated---
      (on metadata table) t1.deltacommit.requested
      (on metadata table) t1.deltacommit.inflight
      (on metadata table) t1.deltacommit
      ---- deltcommit completed ----
      (on metadata table) t1-001.compaction.requested
      (on metadata table) t1-001.compaction.inflight
      (on metadata table) t1-001.commit

      If the t1.commit fails on the dataset then metadata table has already included information from the t1.commit in its base files which will be returned to readers. The metadata table reader logic only checks for deltacommits against completed instants on the dataset timeline and assumes a base file is always SANE.

      Attachments

        Issue Links

          Activity

            People

              shivnarayan sivabalan narayanan
              pwason Prashant Wason
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: