XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.9.9
Fix Version/s: 0.9.9
Component/s: COMP - Compiler, RT - Runtime, STO - Storage
Labels:
- triaged

Epic Link:
Columnar Format

Description

Multiple issues were found while running SQLPPExecutionTest while columnar is the default storage format.

Filter and project pushdowns

NullPointerException could thrown by PushdownUtil.getFieldName(...) when access a join expression. The type computer of the join doesn't include any typing information; hence the thrown exception
Similarly, in other operators. We need to pick the right type computer (input vs. output) depending on the operator
Change scope pushdown scope when UNION ALL operator is encountered to avoid pushing SELECT conditions (incorrectly) after UNION ALL.
Avoid re-registering record variables when computing the expected schema. Such variables should be marked as irreplaceable
Nested functions' arguments' should not be assigned to their produced variables (if any).
UNION ALL is quite special, it contains LogicalVariable triplets (not variable expressions). When computing the defUse chains, the computer should account for the variable used by the UNION ALL
Disallow pushing SELECT conditions of CASE WHEN expressions
Place NoOpAccessor for PKs in columnar filters to avoid advancing (incorrectly) the PKs
The current way of providing FilterAccessorProvider to the filter's IScalarEvaluatorFactories has a race condition as IHyracksTaskContext can be shared. Instead, FilterAccessorProvider should be provided by a dedicated IEvaluatorContext (namely ColumnFilterEvaluatorContext).
Disable filter against fields with heterogeneous numerical values (e.g., double and bigint)
Avoid advancing ColumnarAssembler readers if the mega-leaf node is filtered out (otherwise, we can overrun the reader – no more values exception – or we can read incorrect data)
ColumnLeafFrame should duplicate the page buffer (a shared buffer) to avoid race condition when a dataset is being scanned twice at the sametime.

Storage and record assembly:

Retain empty objects
Preserve the type of declared fields during record assembly (currently, we only produce bigint and doubles, which could be interpreted incorrectly in closed fields if smaller precision types are used)
Ensure PKs column indexes are [0 - N-1] (where N = the number of PKs), whether the PKs are in the root or nested in one or more objects
Ensure there's always a "delegate" when assembling objects. Especially in case of accessing closed and open fields at the same time. Otherwise, we can end up with empty objects.
Ensure created PKs readers by PathExtractorVisitor have max def-level = 1 even if they're nested
Use correct items for array and multiset declared items when LazyVisitablePointable is used
Process actual types instead of union when using LazyVisitablePointable
Ensure key uniqueness on LOAD
Avoid accessing closed fields in empty objects (resulted from the column assembler)
Disallow LSM filters on columnar datasets
Disallow correlated-prefix merge-policy (optimized for LSM-filters) in columnar datasets

Misc.

If storage format specified incorrectly (i.e., it is neither row or column, then a NullPointerException is thrown)

Attachments

Activity

People

Assignee:: Wail Y. Alkowaileet

Reporter:: Wail Y. Alkowaileet

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 04/Dec/23 21:03

Updated:: 05/Jan/24 21:26

Resolved:: 05/Jan/24 21:26