Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4160 Vectorized Query Execution in Hive
  3. HIVE-4478

In ORC, add boolean noNulls flag to column stripe metadata

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.12.0
    • Component/s: File Formats
    • Labels:
      None

      Description

      Currently, the stripe metadata for ORC contains the min and max value for each column in the stripe. This will be used for stripe elimination. However, an additional bit of metadata for each column for each stripe, noNulls (true/false), is needed to help speed up vectorized query execution as much as 30%.

      The vectorized QE code has a Boolean flag for each column vector called noNulls. If this is true, all the null-checking logic is skipped for that column for a VectorizedRowBatch when an operation is performed on that column. For simple filters and arithmetic expressions, this can save on the order of 30% of the time.

      Once this noNulls stripe metadata is available, the vectorized iterator (reader) for ORC can be updated to avoid all expense to load the isNull bitmap, and efficiently set the noNulls flag for each column vector.

        Attachments

        1. HIVE-4478.1.patch.txt
          31 kB
          Prasanth Jayachandran
        2. HIVE-4478.2.git.patch.txt
          30 kB
          Prasanth Jayachandran

          Activity

            People

            • Assignee:
              prasanth_j Prasanth Jayachandran
              Reporter:
              ehans Eric N. Hanson
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: