Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29317

Avoid inheritance hierarchy in pandas CoGroup arrow runner and its plan

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • PySpark, SQL
    • None

    Description

      At SPARK-27463, some refactoring was made. There are two common base abstract classes were introduced:

      1. BaseArrowPythonRunner

      Before:

      └── BasePythonRunner
          ├── ArrowPythonRunner
          ├── CoGroupedArrowPythonRunner
          ├── PythonRunner
          └── PythonUDFRunner
      

      After:

      BasePythonRunner
      ├── BaseArrowPythonRunner
      │   ├── ArrowPythonRunner
      │   └── CoGroupedArrowPythonRunner
      ├── PythonRunner
      └── PythonUDFRunner
      

      The problem is that R code path is being matched with Python side:

      └── BaseRRunner
          ├── ArrowRRunner
          └── RRunner
      

      I would like to match the hierarchy and decouple other stuff for now. Ideally we should deduplicate both code paths. Internal implementation is also similar intentionally.

      2. BasePandasGroupExec

      Before:

      ├── FlatMapGroupsInPandasExec
      └── FlatMapCoGroupsInPandasExec
      

      After:

      └── BasePandasGroupExec
          ├── FlatMapGroupsInPandasExec
          └── FlatMapCoGroupsInPandasExec
      

      Problem is that, R (with Arrow optimization, in particular) has some duplicated codes with Pandas UDFs.

      FlatMapGroupsInRWithArrowExec <> FlatMapGroupsInPandasExec
      MapPartitionsInRWithArrowExec <> ArrowEvalPythonExec

      In order to prepare deduplication here as well, it might better avoid changing hierarchy alone in Python sides but just rather decouple it.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gurwls223 Hyukjin Kwon
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: