Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41226

Refactor Spark types by introducing physical types

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • SQL
    • None

    Description

      I am creating this one for Desmond Cheong since he can't signup for an account because of https://infra.apache.org/blog/jira-public-signup-disabled.html.
       
      His description for this improvement:
      The Spark type system currently supports multiple data types with the same physical representation in memory. For example DateType and YearMonthIntervalType are both implemented using IntegerType. Because of this, operations on data types often involve case matching where multiple data types match to the same effects.To simplify this case matching logic, we can introduce the notion of logical and physical data types where multiple logical data types can be implemented with the same physical data type, then perform case matching on physical data types.Some areas that can utilize this logical/physical type separation are:

      • SpecializedGettersReader in SpecializedGettersReader.java
      • copy in ColumnarBatchRow.java and ColumnarRow.java
      • getAccessor in InternalRow.scala
      • externalDataTypeFor in RowEncoder.scala
      • unsafeWriter in InterpretedUnsafeProjection.scala
      • getValue and javaType in CodeGenerator.scala
      • doValidate  in literals.scala

      Attachments

        Activity

          People

            gengliang Gengliang Wang
            Gengliang.Wang Gengliang Wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: