Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38819

Run Pandas on Spark with Pandas 1.4.x

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.0
    • None
    • None

    Description

      This is a umbrella to track issues when pandas upgrade to 1.4.x

       

      I disable the fast-failed in test, 19 failed:

      https://github.com/Yikun/spark/pull/88/checks?check_run_id=5873627048

       

       

      Attachments

        1.
        GroupByTest failed due to axis Length mismatch Sub-task Resolved Apache Spark
        2.
        Raise indexError when insert loc is out of bounds Sub-task Resolved Yikun Jiang
        3.
        series name should be preserved in series.mode() Sub-task Resolved Yikun Jiang
        4.
        iloc setitem failed due to "Cannot convert * into bool" Sub-task Resolved Yikun Jiang
        5.
        test_nsmallest test failed tue to pandas 1.4.0-1.4.2 bug Sub-task Resolved Yikun Jiang
        6.
        Refresh dtype when astype("category") Sub-task Resolved Yikun Jiang
        7.
        Support Groupby positional indexing Sub-task Resolved Yikun Jiang
        8.
        test_multi_index_dtypes failed due to index mismatch Sub-task Resolved Yikun Jiang
        9.
        test_categories_setter failed due to pandas bug Sub-task Resolved Yikun Jiang
        10.
        groupby.apply doc test failed when SPARK_CONF_ARROW_ENABLED disable Sub-task Resolved Yikun Jiang
        11.
        Respect ps.concat sort parameter to follow pandas behavior Sub-task Resolved Yikun Jiang
        12.
        replace "NaN" with real "None" value in indexes in doctest Sub-task Resolved Yikun Jiang
        13.
        Add migration guide for PS behavior changes Sub-task Resolved Yikun Jiang
        14.
        Generates a new dataframe instead of operating inplace in setitem Sub-task Resolved Yikun Jiang
        15.
        '_SubTest' object has no attribute 'elapsed_time' Sub-task Resolved Yikun Jiang
        16.
        Respect ``Series.concat`` sort parameter to follow 1.4.3 behavior Sub-task Resolved Yikun Jiang
        17.
        Upgrade pandas to 1.4.3 Sub-task Resolved Yikun Jiang
        18.
        rename `required_same_anchor` Sub-task Resolved Yikun Jiang
        19.
        Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas to 1.4+ Sub-task Closed Unassigned
        20.
        ImportError when creating pyspark.pandas document "Supported APIs" if pandas version is low. Sub-task Resolved Yikun Jiang
        21.
        Upgrade pandas to 1.4.4 Sub-task Resolved Yikun Jiang

        Activity

          People

            yikunkero Yikun Jiang
            yikunkero Yikun Jiang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: