Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39202

Introduce a `putByteArrays` method to `WritableColumnVector` to support setting multiple duplicate `byte[]`

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • 3.4.0
    • None
    • SQL
    • None

    Description

      Add a  `putByteArrays` method to `WritableColumnVector`

       

      int putByteArrays(int rowId, int total, byte[] value) 

       

       

      this method used to support setting multiple duplicate `byte[]` to `WritableColumnVector`.

      Since `byte[] value` is fixed length, memory can allocated at one time without calling `

      reserve(int requiredCapacity)` method many times.

       

      This method is applicable to `ColumnVectorUtils.populate` method with `StringType` and partial `DecimalType` scenario, this corresponds to the Vectorized Partition Column filling of Parquet and Orc

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            LuciferYang Yang Jie
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: