Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-8713 Bug fixes and improvements on metadata table - Phase 0
  3. HUDI-5769

Partitions created by Async indexer could be deleted by regular writers

    XMLWordPrintableJSON

Details

    Description

      In regular writer we have a flow, where we detect if some MDT partition is not enabled, but the partition is found in storage and as part of table config's fully built out partitions, hudi deletes the metadata partition with the intent that user wishes to disable it. 

      But this does not sit well w/ async indexer. 

       

      process1 -> Deltastreamer runs continuously. 

      no metadata configs set. 

      which means, default value for metadata enable = true and hence "files" partition will be instantiated inline on first commit. 

      no value set for col stats enable. So, no action will be taken. 

       

      process2: user starts HoodieIndexer for col stats partition. 

      Once indexer completes, tableConfig will add "col stats" as part of fully built out metadata partition. 

       

      While in process1, when deltastreamer goes to next write, it will detect that col stats wasn't enabled (default value as per code), but tableConfig shows that col stats is fully built out, and hence decides to delete the col stats partition and updates the tableConfig. 

       

       

       

       

      Attachments

        Issue Links

          Activity

            People

              shivnarayan sivabalan narayanan
              shivnarayan sivabalan narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: