Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36707

Support to specify index type and name in pandas API on Spark

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Major
    • Resolution: Done
    • 3.3.0
    • 3.3.0
    • PySpark

    Description

      See https://koalas.readthedocs.io/en/latest/user_guide/typehints.html.

      pandas API on Spark currently there's no way to specify the index type and name in the output when you apply an arbitrary function, which forces to create the default index:

      >>> def transform(pdf) -> pd.DataFrame["id": int, "A": int]:
      ...     pdf['A'] = pdf.id + 1
      ...     return pdf
      ...
      >>> ps.range(5).koalas.apply_batch(transform)
      
         id   A
      0   0   1
      1   1   2
      2   2   3
      3   3   4
      4   4   5
      

      We should have a way to specify the index.

      Attachments

        Activity

          People

            gurwls223 Hyukjin Kwon
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: