Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36978

InferConstraints rule should create IsNotNull constraints on the nested field instead of the root nested type

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • SQL
    • None

    Description

      InferFiltersFromConstraints optimization rule generates IsNotNull constraints corresponding to null intolerant predicates. The IsNotNull constraints are generated on the attribute inside the corresponding predicate.
      e.g. A predicate a > 0 on an integer column a will result in a constraint IsNotNull(a). On the other hand a predicate on a nested int column structCol.b where structCol is a struct column results in a constraint IsNotNull(structCol).

      This generation of constraints on the root level nested type is extremely conservative as it could lead to materialization of the the entire struct. The constraint should instead be generated on the nested field being referenced by the predicate. In the above example, the constraint should be IsNotNull(structCol.b) instead of IsNotNull(structCol)

       

      The new constraints also create opportunities for nested pruning. Currently IsNotNull(structCol) constraint would preclude pruning of structCol. However the constraint IsNotNull(structCol.b) could create opportunities to prune structCol.

      Attachments

        Activity

          People

            utkarsh39 Utkarsh Agarwal
            utkarsh39 Utkarsh Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: