Details
-
Improvement
-
Status: Closed
-
Critical
-
Resolution: Duplicate
-
3.0.0
-
Can be reproduced on a Single node pseudo cluster.
-
Important
Description
Optimizer is taking advantage of materialized view only when the query syntax matches the way view was created. Here is an example.
Source table on which materialized views are created
+----------------------------------------------------+ | createtab_stmt | +----------------------------------------------------+ | CREATE TABLE `mysource`( | | `id` int, | | `name` string, | | `start_date` date) | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | | WITH SERDEPROPERTIES ( | | 'field.delim'=',', | | 'serialization.format'=',') | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.mapred.TextInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION | | 'hdfs://xlhive3.openstacklocal:8020/warehouse/tablespace/managed/hive/mysource' | | TBLPROPERTIES ( | | 'bucketing_version'='2', | | 'transactional'='true', | | 'transactional_properties'='insert_only', | | 'transient_lastDdlTime'='1535392655') | +----------------------------------------------------+
One of the materialized views "view_1" is created to fetch the data between IDs 1 and 2 using this statement
select `mysource`.`id`, `mysource`.`name`, `mysource`.`start_date` from `default`.`mysource` where `mysource`.`id` between 1 and 2
*When a SELECT is executed against the source table using the following SELECT statement, this works fine and can be validated with the explain plan.
*
0: jdbc:hive2://localhost:10000/default> explain select * from mysource where id between 1 and 2; INFO : Compiling command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c): explain select * from mysource where id between 1 and 2 INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c); Time taken: 0.224 seconds INFO : Executing command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c): explain select * from mysource where id between 1 and 2 INFO : Starting task [Stage-1:EXPLAIN] in serial mode INFO : Completed executing command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c); Time taken: 0.006 seconds INFO : OK +----------------------------------------------------+ | Explain | +----------------------------------------------------+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: default.view_1 | | Select Operator | | expressions: id (type: int), name (type: string), start_date (type: date) | | outputColumnNames: _col0, _col1, _col2 | | ListSink | | | +----------------------------------------------------+
If the rewrite of the same SELECT is written using >= and <=, which should yield the same result, the optimizer does not take advantage of the materialized view, unless of course we create another view with this >= and <= syntax.
0: jdbc:hive2://localhost:10000/default> explain select * from mysource where id >= 1 and <=2; Error: Error while compiling statement: FAILED: ParseException line 1:49 cannot recognize input near '<=' '2' '<EOF>' in expression specification (state=42000,code=40000) 0: jdbc:hive2://localhost:10000/default> explain select * from mysource where id >= 1 and id <=2; INFO : Compiling command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca): explain select * from mysource where id >= 1 and id <=2 INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca); Time taken: 0.226 seconds INFO : Executing command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca): explain select * from mysource where id >= 1 and id <=2 INFO : Starting task [Stage-1:EXPLAIN] in serial mode INFO : Completed executing command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca); Time taken: 0.005 seconds INFO : OK +----------------------------------------------------+ | Explain | +----------------------------------------------------+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: mysource | | filterExpr: ((id >= 1) and (id <= 2)) (type: boolean) | | Filter Operator | | predicate: ((id >= 1) and (id <= 2)) (type: boolean) | | Select Operator | | expressions: id (type: int), name (type: string), start_date (type: date) | | outputColumnNames: _col0, _col1, _col2 | | ListSink | | | +----------------------------------------------------+
Attachments
Issue Links
- is fixed by
-
HIVE-22256 Rewriting fails when `IN` clause has items in different order in MV and query
- Closed