Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-6084

Ensure write operations to MDT do not absorb failures

    XMLWordPrintableJSON

Details

    Description

      Issue 1:

      When we call compaction on MDT, we do not check the return value. Compaction operation may have had errors reported in the WriteStatus. This will cause missing data in MDT.

      MDT operations should never succeed in case of errors. 

      Issue 2:

      Once a deltacommit has completed, the WriteStatus has been used to finalize the write and write the deltacommit action. The code was collecting the WriteStatus on the driver side to check for any errors that occurred during the writing. Since MDT write config has autoCommit, if there were any errors then there is no value of checking them at this stage since the deltacommit has already completed. Also, the write status RDD may have been unpersisted and if a cached value is not available then it will lead to re-writing of the deltacommit.

       

      Fix:

      MDT uses FailOnFirstErrorWriteStatus which is designed to throw an exception when the first write error is detected. Hence, we do not need to check for write errors explicitly. If any write errors would have occurred then the write itself would not have completed and thrown an exception.

      Also, we do not need to check the WriteStatus after commit has completed.

      Attachments

        Issue Links

          Activity

            People

              pwason Prashant Wason
              pwason Prashant Wason
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: