Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5347

Parquet scanner has a lot of small CPU inefficiencies

    Details

      Description

      I spent some time looking at the parquet scanner in perf top. There are a lot of cases where the code is inefficient in ways that are easily fixed. Together this could add up to a significant perf win for scans.

      The assembly of the core MaterializeValueBatch() loop has a lot of obvious inefficiency:

      • Many loads from memory of values that are constant within the loop
      • The generated bit unpacking and dictionary decoding code has a lot of inefficiency, e.g. a complicated bounds check
      • Hot functions like DictDecoder::Get() are not inlined.

      A lot of time is also spent on some scans calling memset() on one or two bytes inside InitTuple().

        Activity

        Hide
        tarmstrong Tim Armstrong added a comment -

        IMPALA-5347: Parquet scanner microoptimizations

        A mix of microoptimizations that profiling the parquet scanner revealed.
        All lead to some measurable improvement and added up to significant
        speedups for various scans.

        • Add ALWAYS_INLINE to hot functions that GCC was mistakenly not inlining
          in all cases.
        • Apply _restrict_ in a few places so the compiler knows that it is
          safe to cache values accessed via those pointers
        • memset() the whole batch instead of the null indicators is cases where
          it is almost certainly cheaper.
          git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/6950/19

          To view, visit http://gerrit.cloudera.org:8080/6950
          To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
        Show
        tarmstrong Tim Armstrong added a comment - IMPALA-5347 : Parquet scanner microoptimizations A mix of microoptimizations that profiling the parquet scanner revealed. All lead to some measurable improvement and added up to significant speedups for various scans. Add ALWAYS_INLINE to hot functions that GCC was mistakenly not inlining in all cases. Apply _ restrict _ in a few places so the compiler knows that it is safe to cache values accessed via those pointers memset() the whole batch instead of the null indicators is cases where it is almost certainly cheaper. git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/6950/19 – To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

          People

          • Assignee:
            tarmstrong Tim Armstrong
            Reporter:
            tarmstrong Tim Armstrong
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development