Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6116 DataFrame API improvement umbrella ticket (Spark 1.5)
  3. SPARK-8573

For PySpark's DataFrame API, we need to throw exceptions when users try to use and/or/not

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.4.1, 1.5.0
    • Component/s: PySpark, SQL
    • Labels:
      None

      Description

      In PySpark's DataFrame API, we have

      # `and`, `or`, `not` cannot be overloaded in Python,
      # so use bitwise operators as boolean operators
      __and__ = _bin_op('and')
      __or__ = _bin_op('or')
      __invert__ = _func_op('not')
      __rand__ = _bin_op("and")
      __ror__ = _bin_op("or")
      

      Right now, users can still use operators like and, which can cause very confusing behaviors. We need to throw an error when users try to use them and let them know what is the right way to do.

      For example,

      df = sqlContext.range(1, 10)
      df.id > 5 or df.id < 10
      Out[30]: Column<(id > 5)>
      df.id > 5 and df.id < 10
      Out[31]: Column<(id < 10)>
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                davies Davies Liu
                Reporter:
                yhuai Yin Huai
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: