Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41285 Test basework and improvement of test coverage in Spark Connect
  3. SPARK-41794

Reenable ANSI mode in pyspark.sql.tests.connect.test_connect_column

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.0
    • None
    • Connect, Tests

    Description

      ======================================================================
      ERROR [0.901s]: test_column_accessor (pyspark.sql.tests.connect.test_connect_column.SparkConnectTests)
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "/.../spark/python/pyspark/sql/tests/connect/test_connect_column.py", line 744, in test_column_accessor
          cdf.select(CF.col("z")[0], cdf.z[10], CF.col("z")[-10]).toPandas(),
        File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 949, in toPandas
          return self._session.client.to_pandas(query)
        File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in to_pandas
          return self._execute_and_fetch(req)
        File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in _execute_and_fetch
          self._handle_error(rpc_error)
        File "/.../spark/python/pyspark/sql/connect/client.py", line 623, in _handle_error
          raise SparkConnectException(status.message, info.reason) from None
      pyspark.sql.connect.client.SparkConnectException: (org.apache.spark.SparkArrayIndexOutOfBoundsException) [INVALID_ARRAY_INDEX] The index 10 is out of bounds. The array has 3 elements. Use the SQL function `get()` to tolerate accessing element at invalid index and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
      
      ======================================================================
      ERROR [0.245s]: test_column_arithmetic_ops (pyspark.sql.tests.connect.test_connect_column.SparkConnectTests)
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "/.../spark/python/pyspark/sql/tests/connect/test_connect_column.py", line 799, in test_column_arithmetic_ops
          cdf.select(cdf.a % cdf["b"], cdf["a"] % 2, 12 % cdf.c).toPandas(),
        File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 949, in toPandas
          return self._session.client.to_pandas(query)
        File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in to_pandas
          return self._execute_and_fetch(req)
        File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in _execute_and_fetch
          self._handle_error(rpc_error)
        File "/.../spark/python/pyspark/sql/connect/client.py", line 623, in _handle_error
          raise SparkConnectException(status.message, info.reason) from None
      pyspark.sql.connect.client.SparkConnectException: (org.apache.spark.SparkArithmeticException) [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
      
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            podongfeng Ruifeng Zheng
            gurwls223 Hyukjin Kwon

            Dates

              Created:
              Updated:

              Slack

                Issue deployment