Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Connect
    • None

    Description

      Doctest in pyspark.sql.connect.group.GroupedData.agg fails with the error below:

      Failed example:
          df.groupBy(df.name).agg({"*": "count"}).sort("name").show()
      Exception raised:
          Traceback (most recent call last):
            File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 1336, in __run
              exec(compile(example.source, filename, "single",
            File "<doctest pyspark.sql.connect.group.GroupedData.agg[4]>", line 1, in <module>
              df.groupBy(df.name).agg({"*": "count"}).sort("name").show()
            File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 538, in show
              print(self._show_string(n, truncate, vertical))
            File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 424, in _show_string
              pdf = DataFrame.withPlan(
            File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 895, in toPandas
              return self._session.client._to_pandas(query)
            File "/.../spark/python/pyspark/sql/connect/client.py", line 333, in _to_pandas
              return self._execute_and_fetch(req)
            File "/.../spark/python/pyspark/sql/connect/client.py", line 421, in _execute_and_fetch
              for b in self._stub.ExecutePlan(req, metadata=self._builder.metadata()):
            File "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/grpc/_channel.py", line 426, in __next__
              return self._next()
            File "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/grpc/_channel.py", line 826, in _next
              raise self
          grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
          	status = StatusCode.UNKNOWN
          	details = "[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `*` cannot be resolved. Did you mean one of the following? [`age`, `name`];
          'Sort ['name DESC NULLS LAST], true
          +- 'Aggregate [name#26], [name#26, unresolvedalias('count('*), None)]
             +- Project [0#21L AS age#25L, 1#22 AS name#26]
                +- LocalRelation [0#21L, 1#22]
          "
          	debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:15002 {created_time:"2022-12-28T20:55:38.30791+09:00", grpc_status:2, grpc_message:"[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `*` cannot be resolved. Did you mean one of the following? [`age`, `name`];\n\'Sort [\'name DESC NULLS LAST], true\n+- \'Aggregate [name#26], [name#26, unresolvedalias(\'count(\'*), None)]\n   +- Project [0#21L AS age#25L, 1#22 AS name#26]\n      +- LocalRelation [0#21L, 1#22]\n"}"
      

      We should enable this back after fixing the issue in Spark Connect

      Attachments

        Activity

          People

            podongfeng Ruifeng Zheng
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: