XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Resolved
    • 3.4.0
    • None
    • Connect
    • None

    Description

      Currently, Connect API createDataFrame does not support create dataframe with map type.
      For example,

          >>> df = spark.createDataFrame(
          ...     [(1, ["foo", "bar"], {"x": 1.0}), (2, [], {}), (3, None, None)],
          ...     ("id", "an_array", "a_map")
          ... )
      

      The above code want create a dataframe with column 'a_map' which is map type.
      But pyarrow recognize

      {"x": 1.0}

      as a struct not map.
      pyarrow supports map with format [('x', 1.0)]

      Because the data frame's schema is not correct, so the other sequence operator will be impacted.
      For example:

          df.select("id", "a_map", posexplode_outer("an_array")).show()
      

      Expected:

          +---+----------+----+----+
          | id|     a_map| pos| col|
          +---+----------+----+----+
          |  1|{x -> 1.0}|   0| foo|
          |  1|{x -> 1.0}|   1| bar|
          |  2|        {}|null|null|
          |  3|      null|null|null|
          +---+----------+----+----+
      

      Got:

          +---+------+----+----+
          | id| a_map| pos| col|
          +---+------+----+----+
          |  1| {1.0}|   0| foo|
          |  1| {1.0}|   1| bar|
          |  2|{null}|null|null|
          |  3|  null|null|null|
          +---+------+----+----+
          <BLANKLINE>
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            beliefer Jiaan Geng
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: