Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3356

Document when RDD elements' ordering within partitions is nondeterministic

    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.2.0
    • Documentation
    • None

    Description

      As reported in SPARK-3098 for example, for users using zipWithIndex, zipWithUniqueId, etc, (and maybe even things like mapPartitions) it's confusing that the order of elements in each partition after a shuffle operation is nondeterministic (unless the operation was sortByKey). We should explain this in the docs for the zip and partition-wise operations.

      Another subtle issue is that the order of values for each key in groupBy / join / etc can be nondeterministic – we need to explain that too.

      Attachments

        Issue Links

          Activity

            People

              srowen Sean R. Owen
              matei Matei Alexandru Zaharia
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: