Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18003

RDD zipWithIndex generate wrong result when one partition contains more than 2147483647 records.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.2, 2.1.0
    • Spark Core

    Description

      RDD zipWithIndex generate wrong result when one partition contains more than Int.MaxValue records.

      when RDD contains a partition with more than 2147483647 records,
      error occurs.
      for example, if partition-0 has more than 2147483647 records, the index became:
      0,1, ..., 2147483647, -2147483648, -2147483647, -2147483646 ....

      when we do some operation such as repartition or coalesce, it is possible to generate big partition, so this bug should be fixed.

      Attachments

        Activity

          People

            weichenxu123 Weichen Xu
            weichenxu123 Weichen Xu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified