Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-48019

ColumnVectors with dictionaries and nulls are not read/copied correctly

    XMLWordPrintableJSON

Details

    Description

      ColumnVectors have APIs like getInts, getFloats and so on. Those return a primitive array with the contents of the vector. When the ColumnVector has a dictionary, the values are decoded with the dictionary before filling in the primitive array.

      However, ColumnVectors can have nulls, and for those null entries, the dictionary id is irrelevant, and can also be invalid. The dictionary should not be used for the null entries of the vector. Sometimes, this can cause an ArrayIndexOutOfBoundsException .

      In addition to the possible Exception, copying a ColumnarArray is not correct. A ColumnarArray contains a ColumnVector so it can contain null values. However, the copy() for primitive types does not take into account the null-ness of the entries, and blindly copies all the primitive values. That means the null entries get lost.

      Attachments

        Activity

          People

            genepang Gene Pang
            genepang Gene Pang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: