Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2736

Column-wise value materialisation in Parquet scanner

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.3.0, Impala 2.5.0
    • Impala 2.6.0
    • Backend

    Description

      Improve Parquet scanner performance by materialising many values of each column at a time. This would result in tighter loops, better memory access patterns and could avoid a virtual function call to ReadValue() in the inner loop. Currently it essentially does:

      for (row = 0; row < num_rows; ++row) {
        start a new row
        for (col = 0; col < num_cols; ++col) {
          materialise next value for column
          evaluate probe filter for column
        }
        if (probe filters passed && EvalConjuncts(row)) {
          add row to output batch
        }
      }
      

      This would change to something like:

      initialise buffer of num_row values for each column
      initialise bitmap with num_row bits. Bit = 1 means filter row out.
      for (col = 0; col < num_cols; ++col) {
        materialise num_rows values into buffer
        during materialisation, set bits in bitmap where probe filter returns false
      }
      
      for (row = 0; row < num_rows; ++row) {
        if (bitmap[row] == 1) continue
        materialise row from column buffer
        if (EvalConjuncts(row)) {
          add row to output batch
        }
      }
      

      Attachments

        Issue Links

          Activity

            People

              alex.behm Alexander Behm
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: