Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1667

Fix bug when HoodieMergeOnReadRDD read record from base file, Hoodie may set non-null value in field which is null if vectorization is enabled.

    XMLWordPrintableJSON

Details

    Description

      When HoodieMergeOnReadRDD read record from base file,  will create new InternalRow base on requiredStructSchema.

      //代码占位符
      private def createRowWithRequiredSchema(row: InternalRow): InternalRow = {
        val rowToReturn = new SpecificInternalRow(tableState.requiredStructSchema)
        val posIterator = requiredFieldPosition.iterator
        var curIndex = 0
        tableState.requiredStructSchema.foreach(
          f => {
            val curPos = posIterator.next()
            val curField = row.get(curPos, f.dataType)
            rowToReturn.update(curIndex, curField)
            curIndex = curIndex + 1
          }
        )
        rowToReturn
      }
      
      

       Hoodie doesn't check isNull when get value from all fields here.

      If vectorization is enabled, which  means row is ColumnarBatchRow.  **ColumnarBatchRow may return non-null value even if value of field is null. So, hoodie may set non-null value in field which is null.

      Attachments

        Activity

          People

            Thomas Liu Lietong Liu
            Thomas Liu Lietong Liu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: