Uploaded image for project: 'Apache AsterixDB'
  1. Apache AsterixDB
  2. ASTERIXDB-3324

Stabilize columnar datasets

    XMLWordPrintableJSON

Details

    Description

      Multiple issues were found while running SQLPPExecutionTest while columnar is the default storage format.

      Filter and project pushdowns

      • NullPointerException could thrown by PushdownUtil.getFieldName(...) when access a join expression. The type computer of the join doesn't include any typing information; hence the thrown exception
      • Similarly, in other operators. We need to pick the right type computer (input vs. output) depending on the operator
      • Change scope pushdown scope when UNION ALL operator is encountered to avoid pushing SELECT conditions (incorrectly) after UNION ALL. 
      • Avoid re-registering record variables when computing the expected schema. Such variables should be marked as irreplaceable
      • Nested functions' arguments' should not be assigned to their produced variables (if any).
      • UNION ALL is quite special, it contains LogicalVariable triplets (not variable expressions). When computing the defUse chains, the computer should account for the variable used by the UNION ALL
      • Disallow pushing SELECT conditions of CASE WHEN expressions
      • Place NoOpAccessor for PKs in columnar filters to avoid advancing (incorrectly) the PKs
      • The current way of providing FilterAccessorProvider to the filter's IScalarEvaluatorFactories has a race condition as IHyracksTaskContext can be shared. Instead, FilterAccessorProvider should be provided by a dedicated IEvaluatorContext (namely ColumnFilterEvaluatorContext).
      • Disable filter against fields with heterogeneous numerical values (e.g., double and bigint)
      • Avoid advancing ColumnarAssembler readers if the mega-leaf node is filtered out (otherwise, we can overrun the reader – no more values exception – or we can read incorrect data)
      • ColumnLeafFrame should duplicate the page buffer (a shared buffer) to avoid race condition when a dataset is being scanned twice at the sametime.

      Storage and record assembly:

      • Retain empty objects
      • Preserve the type of declared fields during record assembly (currently, we only produce bigint and doubles, which could be interpreted incorrectly in closed fields if smaller precision types are used)
      • Ensure PKs column indexes are [0 - N-1] (where N = the number of PKs), whether the PKs are in the root or nested in one or more objects
      • Ensure there's always a "delegate" when assembling objects. Especially in case of accessing closed and open fields at the same time. Otherwise, we can end up with empty objects.
      • Ensure created PKs readers by PathExtractorVisitor have max def-level = 1 even if they're nested
      • Use correct items for array and multiset declared items when LazyVisitablePointable is used
      • Process actual types instead of union when using LazyVisitablePointable
      • Ensure key uniqueness on LOAD 
      • Avoid accessing closed fields in empty objects (resulted from the column assembler)
      • Disallow LSM filters on columnar datasets
      • Disallow correlated-prefix merge-policy (optimized for LSM-filters) in columnar datasets

      Misc.

      • If storage format specified incorrectly (i.e., it is neither row or column, then a NullPointerException is thrown)

      Attachments

        Activity

          People

            wyk Wail Y. Alkowaileet
            wyk Wail Y. Alkowaileet
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: