Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3884

Enable codegen for TIMESTAMP in hash table.

    Details

    • Docs Text:
      Before this fix, codegen was disable for joins on timestamps, GROUP BY on timestamps, and select distinct of timestamp columns. With this fix, affected aggregations are significantly faster (e.g. 5x).
    • Target Version:

      Description

      Referencing TIMESTAMP columns in joins and aggrregations prevents those exec nodes from being codegen, leading to a severe performance hit. This may impact queries with group-by or join on timestamp column.

      See this snippet from HashTableCtx::CodegenEvalRow() in hash-table.cc:

        // TODO: CodegenAssignNullValue() can't handle TYPE_TIMESTAMP or TYPE_DECIMAL yet
        const vector<ExprContext*>& ctxs = build ? build_expr_ctxs_ : probe_expr_ctxs_;
        for (int i = 0; i < ctxs.size(); ++i) {
          PrimitiveType type = ctxs[i]->root()->type().type;
          if (type == TYPE_TIMESTAMP || type == TYPE_CHAR) {
            return Status(Substitute("HashTableCtx::CodegenEvalRow(): type $0 NYI",
                TypeToString(type)));
          }
        }
      

        Issue Links

          Activity

          Hide
          tarmstrong Tim Armstrong added a comment -

          The lack of codegen for timestamp in compute stats actually is caused by IMPALA-1430. The HashTable limitation only comes into play if the timestamp is a grouping column.

          Show
          tarmstrong Tim Armstrong added a comment - The lack of codegen for timestamp in compute stats actually is caused by IMPALA-1430 . The HashTable limitation only comes into play if the timestamp is a grouping column.
          Hide
          kwho Michael Ho added a comment -

          Yes, they may overlap a bit but this bug has more to do with updating hash tables to support TIMESTAMP while IMPALA-1430 may have also have to do deal with things such as non-builtin UDA.

          Show
          kwho Michael Ho added a comment - Yes, they may overlap a bit but this bug has more to do with updating hash tables to support TIMESTAMP while IMPALA-1430 may have also have to do deal with things such as non-builtin UDA.
          Hide
          tarmstrong Tim Armstrong added a comment -

          I think there are three underlying issues:

          • Codegen support for arbitrary agg functions using the UDA interface
          • Support for loading external UDAs
          • Support for grouping by timestamp

          IMPALA-1430 seems to cover the first two, while this JIRA is talking about the third one.

          I think the first one is what's blocking compute stats (since we implement the timestamp functions as builtins),.

          Show
          tarmstrong Tim Armstrong added a comment - I think there are three underlying issues: Codegen support for arbitrary agg functions using the UDA interface Support for loading external UDAs Support for grouping by timestamp IMPALA-1430 seems to cover the first two, while this JIRA is talking about the third one. I think the first one is what's blocking compute stats (since we implement the timestamp functions as builtins),.
          Hide
          kwho Michael Ho added a comment -

          https://github.com/apache/incubator-impala/commit/13455b5a24a9d4d009d1dd0d72944c6cacd54829

          IMPALA-3884: Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue()

          This change implements support for TYPE_TIMESTAMP for
          HashTableCtx::CodegenAssignNullValue(). TimestampValue itself
          is 16 bytes in size. To match RawValue::Write() in the
          interpreted path, CodegenAssignNullValue() emits code to assign
          HashUtil::FNV_SEED to both the upper and lower 64-bit of the
          destination value. This change also fixes the handling of 128-bit
          Decimal16Value in CodegenAssignNullValue() so the emitted code
          matches the behavior of the interpreted path.

          Change-Id: I0211d38cbef46331e0006fa5ed0680e6e0867bc8
          Reviewed-on: http://gerrit.cloudera.org:8080/4794
          Reviewed-by: Michael Ho <kwho@cloudera.com>
          Tested-by: Michael Ho <kwho@cloudera.com>

          Show
          kwho Michael Ho added a comment - https://github.com/apache/incubator-impala/commit/13455b5a24a9d4d009d1dd0d72944c6cacd54829 IMPALA-3884 : Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue() This change implements support for TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue(). TimestampValue itself is 16 bytes in size. To match RawValue::Write() in the interpreted path, CodegenAssignNullValue() emits code to assign HashUtil::FNV_SEED to both the upper and lower 64-bit of the destination value. This change also fixes the handling of 128-bit Decimal16Value in CodegenAssignNullValue() so the emitted code matches the behavior of the interpreted path. Change-Id: I0211d38cbef46331e0006fa5ed0680e6e0867bc8 Reviewed-on: http://gerrit.cloudera.org:8080/4794 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Michael Ho <kwho@cloudera.com>

            People

            • Assignee:
              kwho Michael Ho
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development