Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16207

order guarantees for DataFrames

    XMLWordPrintableJSON

    Details

    • Type: Documentation
    • Status: Resolved
    • Priority: Minor
    • Resolution: Incomplete
    • Affects Version/s: 1.6.1
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:

      Description

      There's no clear explanation in the documentation about what guarantees are available for the preservation of order in DataFrames. Different blogs, SO answers, and posts on course websites suggest different things. It would be good to provide clarity on this.

      Examples of questions on which I could not find clarification:
      1) Does groupby() preserve order?
      2) Does take() preserve order?
      3) Is DataFrame guaranteed to have the same order of lines as the text file it was read from? (Or as the json file, etc.)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mmoroz Max Moroz
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: