[SPARK-50051] Spark Connect should support str ndarray - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 4.0.0
Fix Version/s: 4.0.0
Component/s: Connect, PySpark
Labels:
- pull-request-available

Description

In [5]: spark.range(1).select(sf.lit(np.array(["a", "b"], np.str_))).schema
---------------------------------------------------------------------------
PySparkTypeError Traceback (most recent call last)
Cell In[5], line 1
----> 1 spark.range(1).select(sf.lit(np.array(["a", "b"], np.str_))).schema

File ~/Dev/spark/python/pyspark/sql/utils.py:272, in try_remote_functions.<locals>.wrapped(*args, **kwargs)
269 if is_remote() and "PYSPARK_NO_NAMESPACE_SHARE" not in os.environ:
270 from pyspark.sql.connect import functions
--> 272 return getattr(functions, f._name_)(*args, **kwargs)
273 else:
274 return f(*args, **kwargs)

File ~/Dev/spark/python/pyspark/sql/connect/functions/builtin.py:271, in lit(col)
269 elif isinstance(col, np.ndarray) and col.ndim == 1:
270 if _from_numpy_type(col.dtype) is None:
--> 271 raise PySparkTypeError(
272 errorClass="UNSUPPORTED_NUMPY_ARRAY_SCALAR",
273 messageParameters={"dtype": col.dtype.name},
274 )
276 # NumpyArrayConverter for Py4J can not support ndarray with int8 values.
277 # Actually this is not a problem for Connect, but here still convert it
278 # to int16 for compatibility.
279 if col.dtype == np.int8:

PySparkTypeError: [UNSUPPORTED_NUMPY_ARRAY_SCALAR] The type of array scalar 'str32' is not supported.

Attachments

Issue Links

links to

GitHub Pull Request #48589

GitHub Pull Request #48592

Activity

People

Assignee:: Ruifeng Zheng

Reporter:: Ruifeng Zheng

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 21/Oct/24 05:17

Updated:: 22/Oct/24 07:59

Resolved:: 22/Oct/24 03:28