Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6642

Query fails to vectorize when a non string partition column is part of the query expression

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      drop table if exists alltypesorc_part;

      CREATE TABLE alltypesorc_part (
      ctinyint tinyint,
      csmallint smallint,
      cint int,
      cbigint bigint,
      cfloat float,
      cdouble double,
      cstring1 string,
      cstring2 string,
      ctimestamp1 timestamp,
      ctimestamp2 timestamp,
      cboolean1 boolean,
      cboolean2 boolean) partitioned by (ds int) STORED AS ORC;

      insert overwrite table alltypesorc_part partition (ds=2011) select * from alltypesorc limit 100;
      insert overwrite table alltypesorc_part partition (ds=2012) select * from alltypesorc limit 200;

      explain select *
      from (select ds from alltypesorc_part) t1,
      alltypesorc t2
      where t1.ds = t2.cint
      order by t2.ctimestamp1
      limit 100;

      The above query fails to vectorize because (select ds from alltypesorc_part) t1 returns a string column and the join equality on t2 is performed on an int column. The correct output when vectorization is turned on should be:
      STAGE DEPENDENCIES:
      Stage-5 is a root stage
      Stage-2 depends on stages: Stage-5
      Stage-0 is a root stage

      STAGE PLANS:
      Stage: Stage-5
      Map Reduce Local Work
      Alias -> Map Local Tables:
      t1:alltypesorc_part
      Fetch Operator
      limit: -1
      Alias -> Map Local Operator Tree:
      t1:alltypesorc_part
      TableScan
      alias: alltypesorc_part
      Statistics: Num rows: 300 Data size: 62328 Basic stats: COMPLETE Column stats: COMPLETE
      Select Operator
      expressions: ds (type: int)
      outputColumnNames: _col0
      Statistics: Num rows: 300 Data size: 1200 Basic stats: COMPLETE Column stats: COMPLETE
      HashTable Sink Operator
      condition expressions:
      0 {_col0}
      1

      {ctinyint} {csmallint} {cint} {cbigint} {cfloat} {cdouble} {cstring1} {cstring2} {ctimestamp1} {ctimestamp2} {cboolean1} {cboolean2}
      keys:
      0 _col0 (type: int)
      1 cint (type: int)

      Stage: Stage-2
      Map Reduce
      Map Operator Tree:
      TableScan
      alias: t2
      Statistics: Num rows: 3536 Data size: 1131711 Basic stats: COMPLETE Column stats: NONE
      Map Join Operator
      condition map:
      Inner Join 0 to 1
      condition expressions:
      0 {_col0}
      1 {ctinyint}

      {csmallint}

      {cint}

      {cbigint}

      {cfloat}

      {cdouble}

      {cstring1}

      {cstring2}

      {ctimestamp1}

      {ctimestamp2}

      {cboolean1}

      {cboolean2}

      keys:
      0 _col0 (type: int)
      1 cint (type: int)
      outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12
      Statistics: Num rows: 3889 Data size: 1244882 Basic stats: COMPLETE Column stats: NONE
      Filter Operator
      predicate: (_col0 = _col3) (type: boolean)
      Statistics: Num rows: 1944 Data size: 622280 Basic stats: COMPLETE Column stats: NONE
      Select Operator
      expressions: _col0 (type: int), _col1 (type: tinyint), _col2 (type: smallint), _col3 (type: int), _col4 (type: bigint), _col5 (type: float), _col6 (type: double), _col7 (type: string), _col8 (type: string), _col\
      9 (type: timestamp), _col10 (type: timestamp), _col11 (type: boolean), _col12 (type: boolean)
      outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12
      Statistics: Num rows: 1944 Data size: 622280 Basic stats: COMPLETE Column stats: NONE
      Reduce Output Operator
      key expressions: _col9 (type: timestamp)
      sort order: +
      Statistics: Num rows: 1944 Data size: 622280 Basic stats: COMPLETE Column stats: NONE
      value expressions: _col0 (type: int), _col1 (type: tinyint), _col2 (type: smallint), _col3 (type: int), _col4 (type: bigint), _col5 (type: float), _col6 (type: double), _col7 (type: string), _col8 (type: strin\
      g), _col9 (type: timestamp), _col10 (type: timestamp), _col11 (type: boolean), _col12 (type: boolean)
      Local Work:
      Map Reduce Local Work
      Execution mode: vectorized
      Reduce Operator Tree:
      Extract
      Statistics: Num rows: 1944 Data size: 622280 Basic stats: COMPLETE Column stats: NONE
      Limit
      Number of rows: 100
      Statistics: Num rows: 100 Data size: 32000 Basic stats: COMPLETE Column stats: NONE
      File Output Operator
      compressed: false
      Statistics: Num rows: 100 Data size: 32000 Basic stats: COMPLETE Column stats: NONE
      table:
      input format: org.apache.hadoop.mapred.TextInputFormat
      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

      Stage: Stage-0
      Fetch Operator
      limit: 100
      where as with the current code, vectorization fails to take place because of the following exception
      14/03/12 14:43:19 DEBUG vector.VectorizationContext: No vector udf found for GenericUDFOPEqual, descriptor: Argument Count = 2, mode = FILTER, Argument Types =

      {STRING,LONG}

      , Input Expression Types =

      {COLUMN,COLUMN}

      14/03/12 14:43:19 DEBUG physical.Vectorizer: Failed to vectorize
      org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFOPEqual, is not supported
      at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:854)
      at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:300)
      at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:682)
      at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateFilterOperator(Vectorizer.java:606)
      at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateOperator(Vectorizer.java:537)
      at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$ValidationNodeProcessor.process(Vectorizer.java:367)
      at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
      at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
      at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
      at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
      at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
      at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateMapWork(Vectorizer.java:314)
      at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(Vectorizer.java:283)
      at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Vectorizer.java:270)
      at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
      at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194)
      at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139)
      at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(Vectorizer.java:519)
      at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:100)
      at org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:290)
      at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:216)
      at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9286)
      at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
      at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64)
      at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
      at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:398)
      at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:294)
      at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:948)
      at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:996)
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:884)
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:874)
      at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359)
      at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:457)
      at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:467)
      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:125)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
      at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
      at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:687)
      at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

        Attachments

        1. HIVE-6642-4.patch
          63 kB
          Hari Sankar Sivarama Subramaniyan
        2. HIVE-6642-3.patch
          24 kB
          Hari Sankar Sivarama Subramaniyan
        3. HIVE-6642-2.patch
          24 kB
          Hari Sankar Sivarama Subramaniyan
        4. HIVE-6642.7.patch
          454 kB
          Hari Sankar Sivarama Subramaniyan
        5. HIVE-6642.6.patch
          459 kB
          Hari Sankar Sivarama Subramaniyan
        6. HIVE-6642.5.patch
          358 kB
          Hari Sankar Sivarama Subramaniyan
        7. HIVE-6642.1.patch
          29 kB
          Hari Sankar Sivarama Subramaniyan

          Issue Links

            Activity

              People

              • Assignee:
                hsubramaniyan Hari Sankar Sivarama Subramaniyan
                Reporter:
                hsubramaniyan Hari Sankar Sivarama Subramaniyan
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: