Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38286

Union's maxRows and maxRowsPerPartition may overflow

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.3, 3.1.2, 3.2.1, 3.3.0
    • 3.1.3, 3.0.4, 3.3.0, 3.2.2
    • SQL
    • None

    Description

      scala> val df1 = spark.range(0, Long.MaxValue, 1, 1)
      df1: org.apache.spark.sql.Dataset[Long] = [id: bigint]
      
      scala> val df2 = spark.range(0, 100, 1, 10)
      df2: org.apache.spark.sql.Dataset[Long] = [id: bigint]
      
      scala> val union = df1.union(df2)
      union: org.apache.spark.sql.Dataset[Long] = [id: bigint]
      
      scala> union.queryExecution.logical.maxRowsPerPartition
      res19: Option[Long] = Some(-9223372036854775799)
      
      scala> union.queryExecution.logical.maxRows
      res20: Option[Long] = Some(-9223372036854775709)
       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            podongfeng Ruifeng Zheng
            podongfeng Ruifeng Zheng
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment