Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21351

Update nullability based on children's output in optimized logical plan

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.2.2, 2.3.2
    • 3.0.0
    • SQL
    • None

    Description

      In the master, optimized plans do not respect the nullability that `Filter` might change when having `IsNotNull`.
      This generates unnecessary code for NULL checks. For example:

      scala> val df = Seq((Some(1), Some(2))).toDF("a", "b")
      scala> val bIsNotNull = df.where($"b" =!= 2).select($"b")
      scala> val targetQuery = bIsNotNull.distinct
      scala> val targetQuery.queryExecution.optimizedPlan.output(0).nullable
      res5: Boolean = true
      
      scala> targetQuery.debugCodegen
      Found 2 WholeStageCodegen subtrees.
      == Subtree 1 / 2 ==
      *HashAggregate(keys=[b#19], functions=[], output=[b#19])
      +- Exchange hashpartitioning(b#19, 200)
         +- *HashAggregate(keys=[b#19], functions=[], output=[b#19])
            +- *Project [_2#16 AS b#19]
               +- *Filter isnotnull(_2#16)
                  +- LocalTableScan [_1#15, _2#16]
      
      Generated code:
      ...
      /* 124 */   protected void processNext() throws java.io.IOException {
      ...
      /* 132 */     // output the result
      /* 133 */
      /* 134 */     while (agg_mapIter.next()) {
      /* 135 */       wholestagecodegen_numOutputRows.add(1);
      /* 136 */       UnsafeRow agg_aggKey = (UnsafeRow) agg_mapIter.getKey();
      /* 137 */       UnsafeRow agg_aggBuffer = (UnsafeRow) agg_mapIter.getValue();
      /* 138 */
      /* 139 */       boolean agg_isNull4 = agg_aggKey.isNullAt(0);
      /* 140 */       int agg_value4 = agg_isNull4 ? -1 : (agg_aggKey.getInt(0));
      /* 141 */       agg_rowWriter1.zeroOutNullBytes();
      /* 142 */
                      // We don't need this NULL check because NULL is filtered out in `$"b" =!=2`
      /* 143 */       if (agg_isNull4) {
      /* 144 */         agg_rowWriter1.setNullAt(0);
      /* 145 */       } else {
      /* 146 */         agg_rowWriter1.write(0, agg_value4);
      /* 147 */       }
      /* 148 */       append(agg_result1);
      /* 149 */
      /* 150 */       if (shouldStop()) return;
      /* 151 */     }
      /* 152 */
      /* 153 */     agg_mapIter.close();
      /* 154 */     if (agg_sorter == null) {
      /* 155 */       agg_hashMap.free();
      /* 156 */     }
      /* 157 */   }
      /* 158 */
      /* 159 */ }
      

      In the line 143, we don't need this NULL check because NULL is filtered out in `$"b" =!=2`.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            maropu Takeshi Yamamuro
            maropu Takeshi Yamamuro
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment