Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47543

Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation.

    XMLWordPrintableJSON

Details

    Description

      Currently the PyArrow infers the Pandas dictionary field as StructType instead of MapType, so Spark can't handle the schema properly:

      >>> pdf = pd.DataFrame({"str_col": ['second'], "dict_col": [{'first': 0.7, 'second': 0.3}]})
      >>> pa.Schema.from_pandas(pdf)
      str_col: string
      dict_col: struct<first: double, second: double>
        child 0, first: double
        child 1, second: double
      

      We cannot handle this case since we use PyArrow for schema creation.

      Attachments

        Activity

          People

            itholic Haejoon Lee
            itholic Haejoon Lee
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: