Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32082

Project Zen: Improving Python usability

    XMLWordPrintableJSON

    Details

    • Type: Epic
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 3.1.0
    • Fix Version/s: None
    • Component/s: PySpark
    • Labels:
      None
    • Epic Name:
      Project Zen

      Description

      The importance of Python and PySpark has grown radically in the last few years. The number of PySpark downloads reached more than 1.3 million every week when we count them only in PyPI. Nevertheless, PySpark is still less Pythonic. It exposes many JVM error messages as an example, and the API documentation is poorly written.

      This epic tickets aims to improve the usability in PySpark, and make it more Pythonic. To be more explicit, this JIRA targets four bullet points below. Each includes examples:

      • Being Pythonic
        • Pandas UDF enhancements and type hints
        • Avoid dynamic function definitions, for example, at funcitons.py which makes IDEs unable to detect.
      • Better and easier usability in PySpark
        • User-facing error message and warnings
        • Documentation
        • User guide
        • Better examples and API documentation, e.g. Koalas and pandas
      • Better interoperability with other Python libraries
        • Visualization and plotting
        • Potentially better interface by leveraging Arrow
        • Compatibility with other libraries such as NumPy universal functions or pandas possibly by leveraging Koalas
      • PyPI Installation
        • PySpark with Hadoop 3 support on PyPi
        • Better error handling

        Attachments

          Activity

            People

            • Assignee:
              hyukjin.kwon Hyukjin Kwon
              Reporter:
              hyukjin.kwon Hyukjin Kwon
            • Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

              • Created:
                Updated: