Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
For some test, tables are created as such:
CREATE TABLE orc_llap_part( csmallint SMALLINT, cint INT, cbigint BIGINT, cfloat FLOAT, cdouble DOUBLE, cstring1 STRING, cstring2 STRING, ctimestamp1 TIMESTAMP, ctimestamp2 TIMESTAMP, cboolean1 BOOLEAN, cboolean2 BOOLEAN ) PARTITIONED BY (ctinyint TINYINT) STORED AS ORC; CREATE TABLE orc_llap_dim_part( cbigint BIGINT ) PARTITIONED BY (ctinyint TINYINT) STORED AS ORC; INSERT OVERWRITE TABLE orc_llap_part PARTITION (ctinyint) SELECT csmallint, cint, cbigint, cfloat, cdouble, cstring1, cstring2, ctimestamp1, ctimestamp2, cboolean1, cboolean2, ctinyint FROM alltypesorc; INSERT OVERWRITE TABLE orc_llap_dim_part PARTITION (ctinyint) SELECT sum(cbigint) as cbigint, ctinyint FROM alltypesorc WHERE ctinyint > 10 AND ctinyint < 21 GROUP BY ctinyint;
The query is:
explain SELECT oft.ctinyint, oft.cint FROM orc_llap_part oft INNER JOIN orc_llap_dim_part od ON oft.ctinyint = od.ctinyint;
This results in a failure to vectorize in MR:
Could not vectorize partition pfile:/Users/sergey/git/hive3/itests/qtest/target/warehouse/orc_llap_dim_part/ctinyint=11. Its column names cbigint do not match the other column names csmallint,cint,cbigint,cfloat,cdouble,cstring1,cstring2,ctimestamp1,ctimestamp2,cboolean1,cboolean2
This is comparing schemas from different tables because MapWork has 2 TableScan-s; in Tez this error will never happen as MapWork will not have 2 scans.
In Tez (and MR as well), the other case can happen, namely partitions of the same table having different schemas.
Tez case can be solved by making a super-schema to include all variations and handling missing columns where necessary.
MR case may be harder to solve.
Of note is that despite schema being different (and not a prefix of a schema by coincidence or some such), query passes if validation is commented out. Perhaps in some cases it can work?