Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0
Description
Here is a small reproducer:
import pandas as pd from pyspark.sql import SparkSession import pyarrow.parquet as pq import pickle df = pd.DataFrame( { "A": [ ["aa", "bb "], ["c"], ["d", "ee", "", "f"], ["ggg", "H"], [""], ] } ) spark = SparkSession.builder.appName("GenSparkData").getOrCreate() spark_df = spark.createDataFrame(df) spark_df.write.parquet("list_str.pq", "overwrite") ds = pq.ParquetDataset("list_str.pq") assert pickle.loads(pickle.dumps(ds.schema)) == ds.schema # PASSES assert pickle.loads(pickle.dumps(ds.schema.to_arrow_schema())) == ds.schema.to_arrow_schema() # FAILS
Attachments
Issue Links
- links to