Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47854

[PYTHON] Avoid shadowing python built-ins in python function variable naming

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.1, 3.5.0, 3.5.1, 3.3.4
    • None
    • PySpark
    • None

    Description

      Given that spark 4.0.0 is upcoming I wonder if we should at least consider renaming certain function variable naming in python. Otherwise, we may need to wait another 4 years to do so.

      Example

      https://github.com/apache/spark/blob/e6b7950f553cff5adc02b8b5195e79cffff3c97c/python/pyspark/sql/functions/builtin.py#L12768

      There are 8 uses of `len` and 35 `str` as variable names, both of which are python built-ins. Shadowing `str` is somewhat dangerous in that the following would be nonsensical:

      def foo(str: "ColumnOrName", bar: "ColumnOrName"):
          # str is variable now, cannot be used as type
          bar = if lit(bar) if isinstance(bar, str) else bar
      

       

      Now obviously this would be breaking change for user code if the function is called with kwargs style. If we rename `str` to `src` or `col`, certain old code using kwargs would break:

      # breaks:
      foo(str="x", bar="y")
      
      # okay:
      foo("x", bar="y")

      Is this change a possibility for 4.0? Or are we thinking that the kwargs breaking change is too big to make compared to the benefit?

       

       

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            liucao Liu Cao

            Dates

              Created:
              Updated:

              Slack

                Issue deployment