Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-3018

Flag if user df has "_hoodie_is_deleted" field with diff data type other than boolean.

    XMLWordPrintableJSON

Details

    • 1

    Description

      as of now, hudi interprets a special column named "_hoodie_is_deleted" and if set to true, the record is considered a delete else an update or an insert. this is not a reserved column as such. For eg, user dataframe can have a column named "_hoodie_is_deleted" whose data type is random string. 

       

      Add validations to hudi to ensure that this columns' data type is boolean if present in the df. 

       

      excerpt from the user

       

      I'd suggest:

      • Possibly dropping the column (as you say if it has little benefits sure). If not, documenting the behaviour somewhere. Alternatively, always include the column, along with the other Hudi metadata fields which are prepended to written schema already.
      • If the column is not a boolean:
        • Failing hard, as this column is essentially "reserved" for Hudi
        • Taking IS NOT NULL as truthy

       

      Attachments

        Issue Links

          Activity

            People

              shivnarayan sivabalan narayanan
              shivnarayan sivabalan narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 1h
                  1h
                  Remaining:
                  Remaining Estimate - 1h
                  1h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified