Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35337

pandas API on Spark: Separate basic operations into data type based structures

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Major
    • Resolution: Done
    • 3.2.0
    • None
    • PySpark
    • None

    Description

      Currently, the same basic operation of all data types is defined in one function, so it’s difficult to extend the behavior change based on the data types. For example, the binary operation Series + Series behaves differently based on the data type, e.g., just adding for numerical operands, concatenating for string operands, etc. The behavior difference is done by if-else in the function, so it’s messy and difficult to maintain or reuse the logic.

      We should provide an infrastructure to manage the differences in these operations.

      Please refer to pandas APIs on Spark: Separate basic operations into data type based structures for details.

      Attachments

        Issue Links

          1.
          Separate arithmetic operations into data type based structures Sub-task Resolved Xinrong Meng
          2.
          Support arithmetic operations against bool IndexOpsMixin Sub-task Resolved Xinrong Meng
          3.
          Introduce BinaryOps for BinaryType Sub-task Resolved Xinrong Meng
          4.
          Introduce ArrayOps, MapOps and StructOps Sub-task Resolved Xinrong Meng
          5.
          Introduce BooleanExtensionOps Sub-task Resolved Xinrong Meng
          6.
          Make the conversion from/to pandas data-type-based for non-ExtensionDtypes Sub-task Resolved Xinrong Meng
          7.
          Introduce a way to compare series of array for older pandas Sub-task Resolved Xinrong Meng
          8.
          Complete arithmetic operators involving bool literals, Series, and Index Sub-task Resolved Xinrong Meng
          9.
          Make astype data-type-based Sub-task Resolved Xinrong Meng
          10.
          Make the conversion to pandas data-type-based for ExtensionDtypes Sub-task Resolved Xinrong Meng
          11.
          Support arithmetic operators (+, *) among bool Series/Index Sub-task Resolved Unassigned
          12.
          Introduce DecimalOps Sub-task Resolved Yikun Jiang
          13.
          Support creating a Column of numpy literal value in pandas-on-Spark Sub-task Resolved Xinrong Meng
          14.
          Make unary and comparison operators data-type-based Sub-task Resolved Xinrong Meng
          15.
          Improve unit tests for data-type-based basic operations Sub-task Resolved Xinrong Meng
          16.
          Standardize TypeError messages for unsupported basic operations Sub-task Resolved Xinrong Meng
          17.
          Add BaseTest for DataTypeOps Sub-task Resolved Yikun Jiang
          18.
          Manage InternalField in DataTypeOps.isnull Sub-task Resolved Takuya Ueshin
          19.
          Make astype data-type-based for DecimalOps Sub-task Resolved Yikun Jiang
          20.
          Assume result's index to be disordered in tests with operations on different Series Sub-task Resolved Xinrong Meng
          21.
          Consolidate tests for data-type-based operations of decimal Series Sub-task Resolved Yikun Jiang
          22.
          Manage InternalField more in DataTypeOps. Sub-task Resolved Takuya Ueshin

          Activity

            People

              XinrongM Xinrong Meng
              XinrongM Xinrong Meng
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: