Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38118

Func(wrong data type) in HAVING clause should throw data mismatch error



    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • SQL
    • None


      with t as (select true c)
      3select t.c
      4from t
      5group by t.c
      6having mean(t.c) > 0 

      This query throws "Column 't.c' does not exist. Did you mean one of the following? [t.c]"

      However, mean(boolean) is not a supported function signature, thus error result should be  "cannot resolve 'mean(t.c)' due to data type mismatch: function average requires numeric or interval types, not boolean"


      This is because

      1. The mean(boolean) in HAVING was not marked as resolved in ResolveFunctions rule.
      1. Thus in ResolveAggregationFunctions, the TempResolvedColumn as a wrapper in mean(TempResolvedColumn(t.c)) cannot be removed (only resolved AGG can remove its’s TempResolvedColumn).
      1. Thus in a later batch rule applying,  TempResolvedColumn was reverted and it becomes mean(`t.c`), so mean loses the information about t.c.
      1. Thus at the last step, the analyzer can only report t.c not found.


      mean(boolean) in HAVING is not marked as resolved in ResolveFunctions rule because 

      1. It uses Expression default `resolved` field population code 
        lazy val resolved: Boolean = childrenResolved && checkInputDataTypes().isSuccess 


      2.  During the analyzing,  mean(boolean) is mean(TempResolveColumn(boolean), thus childrenResolved is true.
      3. however checkInputDataTypes() will be false (Average.scala#L55
      4. Thus eventually Average's `resolved`  will be false, but it leads to wrong error message.






            amaliujia Rui Wang
            amaliujia Rui Wang
            0 Vote for this issue
            3 Start watching this issue