Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21327

ArrayConstructor should handle an array of typecode 'l' as long rather than int in Python 2.

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: PySpark, SQL
    • Labels:
      None

      Description

      Currently ArrayConstructor handles an array of typecode 'l' as int when converting Python object in Python 2 into Java object, so if the value is larger than Integer.MAX_VALUE or smaller than Integer.MIN_VALUE then the overflow occurs.

      import array
      data = [Row(l=array.array('l', [-9223372036854775808, 0, 9223372036854775807]))]
      df = spark.createDataFrame(data)
      df.show(truncate=False)
      
      +----------+
      |l         |
      +----------+
      |[0, 0, -1]|
      +----------+
      

      This should be:

      +----------------------------------------------+
      |l                                             |
      +----------------------------------------------+
      |[-9223372036854775808, 0, 9223372036854775807]|
      +----------------------------------------------+
      

        Attachments

          Activity

            People

            • Assignee:
              ueshin Takuya Ueshin
              Reporter:
              ueshin Takuya Ueshin

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment