Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.9.9
Description
Multiple issues were found while running SQLPPExecutionTest while columnar is the default storage format.
Filter and project pushdowns
- NullPointerException could thrown by PushdownUtil.getFieldName(...) when access a join expression. The type computer of the join doesn't include any typing information; hence the thrown exception
- Similarly, in other operators. We need to pick the right type computer (input vs. output) depending on the operator
- Change scope pushdown scope when UNION ALL operator is encountered to avoid pushing SELECT conditions (incorrectly) after UNION ALL.
- Avoid re-registering record variables when computing the expected schema. Such variables should be marked as irreplaceable
- Nested functions' arguments' should not be assigned to their produced variables (if any).
- UNION ALL is quite special, it contains LogicalVariable triplets (not variable expressions). When computing the defUse chains, the computer should account for the variable used by the UNION ALL
- Disallow pushing SELECT conditions of CASE WHEN expressions
- Place NoOpAccessor for PKs in columnar filters to avoid advancing (incorrectly) the PKs
- The current way of providing FilterAccessorProvider to the filter's IScalarEvaluatorFactories has a race condition as IHyracksTaskContext can be shared. Instead, FilterAccessorProvider should be provided by a dedicated IEvaluatorContext (namely ColumnFilterEvaluatorContext).
- Disable filter against fields with heterogeneous numerical values (e.g., double and bigint)
- Avoid advancing ColumnarAssembler readers if the mega-leaf node is filtered out (otherwise, we can overrun the reader – no more values exception – or we can read incorrect data)
- ColumnLeafFrame should duplicate the page buffer (a shared buffer) to avoid race condition when a dataset is being scanned twice at the sametime.
Storage and record assembly:
- Retain empty objects
- Preserve the type of declared fields during record assembly (currently, we only produce bigint and doubles, which could be interpreted incorrectly in closed fields if smaller precision types are used)
- Ensure PKs column indexes are [0 - N-1] (where N = the number of PKs), whether the PKs are in the root or nested in one or more objects
- Ensure there's always a "delegate" when assembling objects. Especially in case of accessing closed and open fields at the same time. Otherwise, we can end up with empty objects.
- Ensure created PKs readers by PathExtractorVisitor have max def-level = 1 even if they're nested
- Use correct items for array and multiset declared items when LazyVisitablePointable is used
- Process actual types instead of union when using LazyVisitablePointable
- Ensure key uniqueness on LOAD
- Avoid accessing closed fields in empty objects (resulted from the column assembler)
- Disallow LSM filters on columnar datasets
- Disallow correlated-prefix merge-policy (optimized for LSM-filters) in columnar datasets
Misc.
- If storage format specified incorrectly (i.e., it is neither row or column, then a NullPointerException is thrown)