Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
Description
Jason spotted a difference in the query result for vectorization_short_regress.q.out – that is when vectorization is turned off and a base .q.out file created, there are 2 differences.
They both seem to be related to negation. For example, in the first one MAX(cint) and MAX(cint) appear earlier as columns and match non-vec and vec. So, it doesn't appear that aggregation is failing. It seems like the issue is now that the Reducer is vectorizing, a bug is exposed. So, even though MAX and MIN are the same, the expression with negation returns different results.
19th field of the query below: Vectorized 511 vs Non-Vectorized -58
SELECT MAX(cint), (MAX(cint) / -3728), (MAX(cint) * -3728), VAR_POP(cbigint), (-((MAX(cint) * -3728))), STDDEV_POP(csmallint), (-563 % (MAX(cint) * -3728)), (VAR_POP(cbigint) / STDDEV_POP(csmallint)), (-(STDDEV_POP(csmallint))), MAX(cdouble), AVG(ctinyint), (STDDEV_POP(csmallint) - 10.175), MIN(cint), ((MAX(cint) * -3728) % (STDDEV_POP(csmallint) - 10.175)), (-(MAX(cdouble))), MIN(cdouble), (MAX(cdouble) % -26.28), STDDEV_SAMP(csmallint), (-((MAX(cint) / -3728))), ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))), ((MAX(cint) / -3728) - AVG(ctinyint)), (-((MAX(cint) * -3728))), VAR_SAMP(cint) FROM alltypesorc WHERE (((cbigint <= 197) AND (cint < cbigint)) OR ((cdouble >= -26.28) AND (csmallint > cdouble)) OR ((ctinyint > cfloat) AND (cstring1 RLIKE '.*ss.*')) OR ((cfloat > 79.553) AND (cstring2 LIKE '10%')))
Column expression is: ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
-----------------------------------------------
This is a previously existing issue and now filed as HIVE-16919: "Vectorization: vectorization_short_regress.q has query result differences with non-vectorized run"
10th field of the query below: Non-Vectorized -6432.000015344526 vs. -Vectorized -6432.0
Column expression is (-(cdouble)) as c4,
Query result for vectorization_short_regress.q.out – that is when vectorization is turned off and a base .q.out file created.
-----------------------------------------------
10th field of the query below: Non-Vectorized -6432.000015344526 vs. Vectorized -6432.0
Column expression is (-(cdouble)) as c4,
SELECT ctimestamp1, cstring2, cdouble, cfloat, cbigint, csmallint, (cbigint / 3569) as c1, (-257 - csmallint) as c2, (-6432 * cfloat) as c3, (-(cdouble)) as c4, (cdouble * 10.175) as c5, ((-6432 * cfloat) / cfloat) as c6, (-(cfloat)) as c7, (cint % csmallint) as c8, (-(cdouble)) as c9, (cdouble * (-(cdouble))) as c10 FROM alltypesorc WHERE (((-1.389 >= cint) AND ((csmallint < ctinyint) AND (-6432 > csmallint))) OR ((cdouble >= cfloat) AND (cstring2 <= 'a')) OR ((cstring1 LIKE 'ss%') AND (10.175 > cbigint)))