Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41282 Feature parity: Column API in Spark Connect
  3. SPARK-41772

Enable pyspark.sql.connect.column.Column.withField doctest

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Connect
    • None

    Description

      It fails as below:

      File "/.../spark/python/pyspark/sql/connect/column.py", line 391, in pyspark.sql.connect.column.Column.withField
      Failed example:
          df.withColumn('a', df['a'].withField('b', lit(3))).select('a.b').show()
      Exception raised:
          Traceback (most recent call last):
            File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 1336, in __run
              exec(compile(example.source, filename, "single",
            File "<doctest pyspark.sql.connect.column.Column.withField[3]>", line 1, in <module>
              df.withColumn('a', df['a'].withField('b', lit(3))).select('a.b').show()
            File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 538, in show
              print(self._show_string(n, truncate, vertical))
            File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 424, in _show_string
              pdf = DataFrame.withPlan(
            File "/.../python/pyspark/sql/connect/dataframe.py", line 910, in toPandas
              return self._session.client.to_pandas(query)
            File "/.../python/pyspark/sql/connect/client.py", line 413, in to_pandas
              return self._execute_and_fetch(req)
            File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in _execute_and_fetch
              self._handle_error(rpc_error)
            File "/.../spark/python/pyspark/sql/connect/client.py", line 627, in _handle_error
              raise SparkConnectException(str(rpc_error)) from None
          pyspark.sql.connect.client.SparkConnectException: <_MultiThreadedRendezvous of RPC that terminated with:
          	status = StatusCode.UNKNOWN
          	details = "Expression with ID: 0 is not supported"
          	debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:15002 {grpc_message:"Expression with ID: 0 is not supported", grpc_status:2, created_time:"2022-12-29T21:25:46.707558+09:00"}"
          >
      **********************************************************************
      File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/connect/column.py", line 397, in pyspark.sql.connect.column.Column.withField
      Failed example:
          df.withColumn('a', df['a'].withField('d', lit(4))).select('a.d').show()
      Exception raised:
          Traceback (most recent call last):
            File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 1336, in __run
              exec(compile(example.source, filename, "single",
            File "<doctest pyspark.sql.connect.column.Column.withField[4]>", line 1, in <module>
              df.withColumn('a', df['a'].withField('d', lit(4))).select('a.d').show()
            File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 538, in show
              print(self._show_string(n, truncate, vertical))
            File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 424, in _show_string
              pdf = DataFrame.withPlan(
            File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 910, in toPandas
              return self._session.client.to_pandas(query)
            File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in to_pandas
              return self._execute_and_fetch(req)
            File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in _execute_and_fetch
              self._handle_error(rpc_error)
            File "/.../spark/python/pyspark/sql/connect/client.py", line 627, in _handle_error
              raise SparkConnectException(str(rpc_error)) from None
          pyspark.sql.connect.client.SparkConnectException: <_MultiThreadedRendezvous of RPC that terminated with:
          	status = StatusCode.UNKNOWN
          	details = "Expression with ID: 0 is not supported"
          	debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:15002 {created_time:"2022-12-29T21:25:46.71644+09:00", grpc_status:2, grpc_message:"Expression with ID: 0 is not supported"}"
      

      Attachments

        Activity

          People

            podongfeng Ruifeng Zheng
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: