Finding the widest common type for the arguments of a variadic function (such as IN or COALESCE) when the types of the arguments are a combination of DateType/TimestampType, StringType, and NumericType fails with an AnalysisException for some orders of the arguments and succeeds with a common type of StringType for other orders of the arguments.
The below examples used to reproduce the error assume a schema of:
[c1: date, c2: string, c3: int]
The following succeeds:
SELECT coalesce(c1, c2, c3) FROM table
While the following produces an exception:
SELECT coalesce(c1, c3, c2) FROM table
The order of arguments affects the behavior because it looks to be the widest common type is found by repeatedly looking at two arguments at a time, the widest common type found thus far and the next argument. On initial thought of a fix, I think the way the widest common type is found would have to be changed and instead look at all arguments first before deciding what the widest common type should be.
As my boss is out of office for the rest of the day I will give a pull request a shot, but as I am not super familiar with Scala or Spark's coding style guidelines, a pull request is not promised. Going forward with my attempted pull request, I will assume having DateType/TimestampType, StringType, and NumericType arguments in an IN expression and COALESCE function (and any other function/expression where this combination of argument types can occur) is valid. I find it also quite reasonable to have this combination of argument types to be invalid, so if that's what is decided, then oh well.
If I were a betting man, I'd say the fix would be made in the following file: TypeCoercion.scala