In PySpark's DataFrame API, we have
# `and`, `or`, `not` cannot be overloaded in Python, # so use bitwise operators as boolean operators __and__ = _bin_op('and') __or__ = _bin_op('or') __invert__ = _func_op('not') __rand__ = _bin_op("and") __ror__ = _bin_op("or")
Right now, users can still use operators like and, which can cause very confusing behaviors. We need to throw an error when users try to use them and let them know what is the right way to do.
For example,
df = sqlContext.range(1, 10) df.id > 5 or df.id < 10 Out[30]: Column<(id > 5)> df.id > 5 and df.id < 10 Out[31]: Column<(id < 10)>
- duplicates
-
SPARK-8568 Prevent accidental use of "and" and "or" to build invalid expressions in Python
-
- Resolved
-
- links to