Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24865

Remove AnalysisBarrier



    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0, 2.3.1
    • 2.4.0
    • SQL
    • None


      AnalysisBarrier was introduced in SPARK-20392 to improve analysis speed (don't re-analyze nodes that have already been analyzed).

      Before AnalysisBarrier, we already had some infrastructure in place, with analysis specific functions (resolveOperators and resolveExpressions). These functions do not recursively traverse down subplans that are already analyzed (with a mutable boolean flag _analyzed). The issue with the old system was that developers started using transformDown, which does a top-down traversal of the plan tree, because there was not top-down resolution function, and as a result analyzer performance became pretty bad.

      In order to fix the issue in SPARK-20392, AnalysisBarrier was introduced as a special node and for this special node, transform/transformUp/transformDown don't traverse down. However, the introduction of this special node caused a lot more troubles than it solves. This implicit node breaks assumptions and code in a few places, and it's hard to know when analysis barrier would exist, and when it wouldn't. Just a simple search of AnalysisBarrier in PR discussions demonstrates it is a source of bugs and additional complexity.

      Instead, I think a much simpler fix to the original issue is to introduce resolveOperatorsDown, and change all places that call transformDown in the analyzer to use that. We can also ban accidental uses of the various transform* methods by using a linter (which can only lint specific packages), or in test mode inspect the stack trace and fail explicitly if transform* are called in the analyzer. 


        Issue Links



              rxin Reynold Xin
              rxin Reynold Xin
              0 Vote for this issue
              7 Start watching this issue