Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16412

Hive on Tez incorrect partition pruning ANALYZE TABLE

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.1.1
    • Fix Version/s: None
    • Component/s: Hive
    • Environment:

      Hadoop2.7.3, Hive 2.1.1, Tez 0.8.5

      Description

      Hive on Tez, on partitioned tables ANALYZE TABLE T PARTITION (...) COMPUTE STATISTICS; will gather stats for all partitions from metastore even though partition spec only chooses a subset. Hive on MR runs efficiently.
      For example:

      analyze table ext_data_part partition(a=9957) compute statistics noscan

      Will cause:

      2017-04-09T22:25:30,332 DEBUG [main] metastore.MetaStoreDirectSql: Direct SQL query in 12.30189ms + 0.037891ms, the query is [select "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"
      and "DBS"."NAME" = ? ]

      And:
      2017-03-02T16:54:08,104 DEBUG [main([])]: log.PerfLogger ()) - </PERFLOG method=TezCompiler start=1488473648104 end=1488473648104 duration=0 from=org.apache.hadoop.hive.ql.parse.TezCompiler Setup dynamic partition pruning>
      2017-03-02T16:54:08,104 DEBUG [main([])]: log.PerfLogger ()) - <PERFLOG method=TezCompiler from=org.apache.hadoop.hive.ql.parse.TezCompiler>
      2017-03-02T16:54:08,110 DEBUG [main([])]: log.PerfLogger ()) - <PERFLOG method=partition-retrieving from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>
      2017-03-02T16:54:08,153 DEBUG [main([])]: log.PerfLogger ()) - </PERFLOG method=partition-retrieving start=1488473648110 end=1488473648153 duration=43 from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>

      The stackTrace:
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2265)

      • locked <0x00000003de3798f0> (a org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler)
        at com.sun.proxy.$Proxy21.listPartitions(Unknown Source)
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllPartitionsOf(Hive.java:2301)
        at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getAllPartitions(PartitionPruner.java:454)
        at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getAllPartsFromCacheOrServer(PartitionPruner.java:236)
        at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:195)
        at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:144)
        at org.apache.hadoop.hive.ql.parse.ParseContext.getPrunedPartitions(ParseContext.java:511)
        at org.apache.hadoop.hive.ql.parse.ParseContext.getPrunedPartitions(ParseContext.java:504)
        at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:121)
        at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
        at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143)
        at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122)
        at org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
        at org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:259)
        at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:128)
        at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:134)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10947)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10526)
        at org.apache.hadoop.hive.ql.parse.ColumnStatsSemanticAnalyzer.analyze(ColumnStatsSemanticAnalyzer.java:385)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              shenavandeh Amir Shenavandeh
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: