Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9429

Unioned partition columns break partition pruning

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 3.2.0
    • Fix Version/s: Impala 3.4.0
    • Component/s: Frontend
    • Labels:

      Description

      We have different granularity of partitions on our landing tables vs our compacted tables. We use a view to union our landing and our compacted. After an upgrade from cdh5.15 (Impala v2.12.0) to cdh6.3 (Impala 3.2.0) we started having issues with our union-ed tables. I've come up with this as the smallest breaking example.

      [:21000] debug> create table debug_with_partition (col1 int) partitioned by (col2 int, col3 int);                                                                                                                                                                                             
      Query: create table debug_with_partition (col1 int) partitioned by (col2 int, col3 int)
      +-------------------------+
      | summary                 |
      +-------------------------+
      | Table has been created. |
      +-------------------------+
      Fetched 1 row(s) in 0.09s
      [:21000] debug> create table debug_without_partition (col1 int) partitioned by (col2 int);                                                                                                                                                                                                    
      Query: create table debug_without_partition (col1 int) partitioned by (col2 int)
      +-------------------------+
      | summary                 |
      +-------------------------+
      | Table has been created. |
      +-------------------------+
      Fetched 1 row(s) in 0.03s
      [:21000] debug> create view debug as select col1, col2, col3 from debug_with_partition union all select col1, col2, null from debug_without_partition;                                                                                                                                        
      Query: create view debug as select col1, col2, col3 from debug_with_partition union all select col1, col2, null from debug_without_partition
      Query submitted at: 2020-02-26 17:04:58 (Coordinator: :25000)
      Query progress can be monitored at: :25000/query_plan?query_id=28453bdf5f919fe9:66fef22200000000
      +------------------------+
      | summary                |
      +------------------------+
      | View has been created. |
      +------------------------+
      Fetched 1 row(s) in 5.65s
      [:21000] debug> select * from debug where col2 = 0 or col3 = 0;                                                                                                                                                                                                                               
      Query: select * from debug where col2 = 0 or col3 = 0
      Query submitted at: 2020-02-26 17:05:21 (Coordinator: t:25000)
      ERROR: IllegalStateException: null
      

      Here is what I find in the log

      I0226 17:05:21.099532 129442 jni-util.cc:256] c34e2a72018579fe:3d7388e100000000] java.lang.IllegalStateException
                                      at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
                                      at org.apache.impala.planner.HdfsPartitionPruner.canEvalUsingPartitionMd(HdfsPartitionPruner.java:196)
                                      at org.apache.impala.planner.HdfsPartitionPruner.canEvalUsingPartitionMd(HdfsPartitionPruner.java:211)
                                      at org.apache.impala.planner.HdfsPartitionPruner.prunePartitions(HdfsPartitionPruner.java:131)
                                      at org.apache.impala.planner.SingleNodePlanner.createHdfsScanPlan(SingleNodePlanner.java:1257)
                                      at org.apache.impala.planner.SingleNodePlanner.createScanNode(SingleNodePlanner.java:1348)
                                      at org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1535)
                                      at org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:814)
                                      at org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:650)
                                      at org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:258)
                                      at org.apache.impala.planner.SingleNodePlanner.createUnionPlan(SingleNodePlanner.java:1584)
                                      at org.apache.impala.planner.SingleNodePlanner.createUnionPlan(SingleNodePlanner.java:1651)
                                      at org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:280)
                                      at org.apache.impala.planner.SingleNodePlanner.createInlineViewPlan(SingleNodePlanner.java:1088)
                                      at org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1546)
                                      at org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:814)
                                      at org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:650)
                                      at org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:258)
                                      at org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:148)
                                      at org.apache.impala.planner.Planner.createPlan(Planner.java:103)
                                      at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1171)
                                      at org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:1466)
                                      at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1345)
                                      at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1252)
                                      at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1222)
                                      at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:167)
      I0226 17:05:21.099617 129442 status.cc:124] c34e2a72018579fe:3d7388e100000000] IllegalStateException: null
          @           0xb4c459
          @          0x114fe2e
          @          0x102ab53
          @          0x1052ba2
          @          0x105e88c
          @          0x109e5be
          @          0x138fee4
          @          0x138f39c
          @           0xb18169
          @           0xf2d1d8
          @           0xf23c4e
          @           0xf24ae1
          @          0x11c5e0f
          @          0x11c69b9
          @          0x1840569
          @     0x7f2ef82926b9
          @     0x7f2ef7fc841c
      

      I've done some level of debugging from the shell and I find that the following things work
      querying just on the null filled column

      [:21000] debug> select * from debug where col3 = 0;
      Query: select * from debug where col3 = 0
      Query submitted at: 2020-02-26 17:07:07 (Coordinator: :25000)
      Query progress can be monitored at: :25000/query_plan?query_id=1b44157731b6f5ff:d052c2c600000000
      Fetched 0 row(s) in 0.11s
      

      query with an and on the null filled column

      [:21000] debug> select * from debug where col2 = 0 and col3 = 0;
      Query: select * from debug where col2 = 0 and col3 = 0
      Query submitted at: 2020-02-26 17:07:27 (Coordinator: :25000)
      Query progress can be monitored at: :25000/query_plan?query_id=334f7fbf2367a558:6ebe4d6100000000
      Fetched 0 row(s) in 0.11s
      

      casting the null filled column

      [:21000] debug> select * from debug where col2 = 0 or cast(col3 as int) = 0;
      Query: select * from debug where col2 = 0 or cast(col3 as int) = 0
      Query submitted at: 2020-02-26 17:08:26 (Coordinator: :25000)
      Query progress can be monitored at: :25000/query_plan?query_id=1a4d43d8fc9fc45d:662922b900000000
      Fetched 0 row(s) in 0.11s
      

      Please let me know if there is anything else I can do to help!

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              kdeschle Kurt Deschler
              Reporter:
              maxmzkr Max Mizikar

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment