Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-131 Vectorized Reader In Parquet
  3. PARQUET-333

[Vectorized Reader] Add attributes in ColumnVector and RowBatch

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • parquet-mr
    • None

    Description

      As discussed in HIVE-8128, we want to add some attributes in vector.

      • In ColumnVector, add two attributes: one is boolean noNulls, which indicates whether the whole column vector has no null value. The other is boolean isRepeating, which indicates whether the same value repeats for whole column vector. They could be calculated at the same time when we read a vector. SQL engines (like Hive) can check these attribute to skip some values.
      • In RowBatch, add one attribute int size, which indicates the number of rows in this batch. This is just for easy usage. Its value should be the same as RowBatch.columns[0].numValues.

      Attachments

        Activity

          People

            nezihyigitbasi Nezih Yigitbasi
            dongc Dong Chen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: