Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3552

Make incremental stats max serialized size configurable or dramatically increase limit.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.5.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Catalog
    • Labels:
    • Docs Text:
      Adds a new configuration parameter "inc_stats_size_limit_bytes", to configure maximum size of intermediate stats serialized per table, to be shipped to the impalads. Defaults to 200MB.
    • Target Version:

      Description

      The fix fox IMPALA-2648/IMPALA-2664 introduced a conservative limitation on the maximum serialized size of incremental stats. As a result, some users with very large tables experienced regressions when upgrading because incremental stats did not work anymore.

      We should consider making the limit configurable, or even better fix the design to allow a much higher limit.

      Dimitris, assigning to you for triage/consideration.

        Issue Links

          Activity

          Show
          alex.behm Alexander Behm added a comment - User sdbigdata reported this issue here: http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Incremental-stats-size-estimate-exceeds-200-00MB/m-p/40843
          Hide
          dtsirogiannis Dimitris Tsirogiannis added a comment -

          Pushing the limit higher can give us some leeway but I don't think this solution will last for a long time. https://issues.cloudera.org/browse/IMPALA-2649 is the right thing to do but I don't think Jim will have time to work on it in the near future. Silvius Rus I think we should bump up the priority for this item and start working on it as soon as possible.

          Show
          dtsirogiannis Dimitris Tsirogiannis added a comment - Pushing the limit higher can give us some leeway but I don't think this solution will last for a long time. https://issues.cloudera.org/browse/IMPALA-2649 is the right thing to do but I don't think Jim will have time to work on it in the near future. Silvius Rus I think we should bump up the priority for this item and start working on it as soon as possible.
          Hide
          alex.behm Alexander Behm added a comment -

          By pushing the limit higher I mean changing the design such that there is no more need for such an unreasonably low limit - not just leaving things along and increasing the limit.

          Show
          alex.behm Alexander Behm added a comment - By pushing the limit higher I mean changing the design such that there is no more need for such an unreasonably low limit - not just leaving things along and increasing the limit.
          Hide
          dtsirogiannis Dimitris Tsirogiannis added a comment -

          I agree about changing the design. In essence, incremental stats shouldn't be part of the table metadata that get send everywhere.

          Show
          dtsirogiannis Dimitris Tsirogiannis added a comment - I agree about changing the design. In essence, incremental stats shouldn't be part of the table metadata that get send everywhere.
          Hide
          HuaisiXu Huaisi Xu added a comment -

          I believe IMPALA-2648/IMPALA-2664 only prevents catalog from sending those stats, but those stats is in memory as well. we should probably not storing that in memory at all?

          Show
          HuaisiXu Huaisi Xu added a comment - I believe IMPALA-2648 / IMPALA-2664 only prevents catalog from sending those stats, but those stats is in memory as well. we should probably not storing that in memory at all?
          Hide
          alex.behm Alexander Behm added a comment -

          In the short term, we should make this max size configurable via an catalogd/impalad startup option. The option should clearly indicate that it is an advanced configuration and that deviating from the default value is not recommended or safe.

          Show
          alex.behm Alexander Behm added a comment - In the short term, we should make this max size configurable via an catalogd/impalad startup option. The option should clearly indicate that it is an advanced configuration and that deviating from the default value is not recommended or safe.
          Hide
          yonghyun_impala_8905 Yonghyun Hwang added a comment -

          I has been stuck w/ other Jira issues. Now, I'm actively working on this.

          Show
          yonghyun_impala_8905 Yonghyun Hwang added a comment - I has been stuck w/ other Jira issues. Now, I'm actively working on this.
          Hide
          srus Silvius Rus added a comment -

          My understanding is that currently the incremental stats get propagated to all impalads.

          I'm concerned that increasing this limit may lead to very large hidden cost through higher RAM usage on impalads and through high network costs while propagating the incremental stats metadata.

          Marcel Kornacker, Henry Robinson, Dimitris Tsirogiannis, can we avoid this cost by keeping the incremental stats only in catalogd without propagating them to all impalads?

          Show
          srus Silvius Rus added a comment - My understanding is that currently the incremental stats get propagated to all impalads. I'm concerned that increasing this limit may lead to very large hidden cost through higher RAM usage on impalads and through high network costs while propagating the incremental stats metadata. Marcel Kornacker , Henry Robinson , Dimitris Tsirogiannis , can we avoid this cost by keeping the incremental stats only in catalogd without propagating them to all impalads?
          Hide
          dtsirogiannis Dimitris Tsirogiannis added a comment -

          Yes, we can avoid sending incremental stats to all the impalad nodes; that is the ultimate goal. However, this is not a quick fix and will take sometime (2-3 weeks including review time) to finish.

          Show
          dtsirogiannis Dimitris Tsirogiannis added a comment - Yes, we can avoid sending incremental stats to all the impalad nodes; that is the ultimate goal. However, this is not a quick fix and will take sometime (2-3 weeks including review time) to finish.
          Hide
          bharathv bharath v added a comment -

          Commit: ce558a885d6318c8a0af6d65dbe2ca6319b82aab
          Author: Bharath Vissapragada <bharathv@cloudera.com>
          Date: 2016-11-14 (Mon, 14 Nov 2016)

          Changed paths:
          M be/generated-sources/gen-cpp/CMakeLists.txt
          M be/src/catalog/catalog.cc
          M be/src/common/global-flags.cc
          M be/src/service/fe-support.cc
          M be/src/service/frontend.cc
          M be/src/util/CMakeLists.txt
          A be/src/util/backend-gflag-util.cc
          A be/src/util/backend-gflag-util.h
          A common/thrift/BackendGflags.thrift
          M common/thrift/CMakeLists.txt
          M common/thrift/Frontend.thrift
          M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java
          M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
          M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
          M fe/src/main/java/org/apache/impala/authorization/User.java
          M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
          M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
          M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
          M fe/src/main/java/org/apache/impala/planner/Planner.java
          M fe/src/main/java/org/apache/impala/service/BackendConfig.java
          M fe/src/main/java/org/apache/impala/service/FeSupport.java
          M fe/src/main/java/org/apache/impala/service/JniCatalog.java
          M fe/src/main/java/org/apache/impala/service/JniFrontend.java
          M fe/src/main/java/org/apache/impala/util/KuduUtil.java

          Log Message:
          -----------
          IMPALA-3552: Make incremental stats max serialized size configurable

          The fix "IMPALA-2648/IMPALA-2664" introduced a conservative limitation
          on the maximum serialized size of incremental stats. As a side effect,
          some users with very large tables are experiencing regressions
          especially when they upgrade impala and the serialized size goes
          beyond 200MB.

          To mitigate the issue, the change introduces a new gflag,
          'inc_stats_size_limit_bytes' to make the max serialized size
          configurable, which allows impala users to specify their own maximum
          serialized size. Default value for inc_stats_size_limit_bytes is
          200MB.

          The change introduces a TBackendGflags class to pass the gflags from
          backend to the Frontend and the Catalog via thrift. This also revamps
          existing query options to use the TBackendConfig.

          Change-Id: I33684725a61eabc67237503e61178305d37d3cb5
          Reviewed-on: http://gerrit.cloudera.org:8080/4867
          Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
          Tested-by: Internal Jenkins

          Show
          bharathv bharath v added a comment - Commit: ce558a885d6318c8a0af6d65dbe2ca6319b82aab Author: Bharath Vissapragada <bharathv@cloudera.com> Date: 2016-11-14 (Mon, 14 Nov 2016) Changed paths: M be/generated-sources/gen-cpp/CMakeLists.txt M be/src/catalog/catalog.cc M be/src/common/global-flags.cc M be/src/service/fe-support.cc M be/src/service/frontend.cc M be/src/util/CMakeLists.txt A be/src/util/backend-gflag-util.cc A be/src/util/backend-gflag-util.h A common/thrift/BackendGflags.thrift M common/thrift/CMakeLists.txt M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/authorization/User.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/main/java/org/apache/impala/util/KuduUtil.java Log Message: ----------- IMPALA-3552 : Make incremental stats max serialized size configurable The fix " IMPALA-2648 / IMPALA-2664 " introduced a conservative limitation on the maximum serialized size of incremental stats. As a side effect, some users with very large tables are experiencing regressions especially when they upgrade impala and the serialized size goes beyond 200MB. To mitigate the issue, the change introduces a new gflag, 'inc_stats_size_limit_bytes' to make the max serialized size configurable, which allows impala users to specify their own maximum serialized size. Default value for inc_stats_size_limit_bytes is 200MB. The change introduces a TBackendGflags class to pass the gflags from backend to the Frontend and the Catalog via thrift. This also revamps existing query options to use the TBackendConfig. Change-Id: I33684725a61eabc67237503e61178305d37d3cb5 Reviewed-on: http://gerrit.cloudera.org:8080/4867 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Internal Jenkins

            People

            • Assignee:
              bharathv bharath v
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              3 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development