Details
Description
In PySpark's DataFrame API, we have
# `and`, `or`, `not` cannot be overloaded in Python, # so use bitwise operators as boolean operators __and__ = _bin_op('and') __or__ = _bin_op('or') __invert__ = _func_op('not') __rand__ = _bin_op("and") __ror__ = _bin_op("or")
Right now, users can still use operators like and, which can cause very confusing behaviors. We need to throw an error when users try to use them and let them know what is the right way to do.
For example,
df = sqlContext.range(1, 10) df.id > 5 or df.id < 10 Out[30]: Column<(id > 5)> df.id > 5 and df.id < 10 Out[31]: Column<(id < 10)>
Attachments
Issue Links
- duplicates
-
SPARK-8568 Prevent accidental use of "and" and "or" to build invalid expressions in Python
- Resolved
- links to