Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16207

order guarantees for DataFrames

    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 1.6.1
    • None
    • Spark Core

    Description

      There's no clear explanation in the documentation about what guarantees are available for the preservation of order in DataFrames. Different blogs, SO answers, and posts on course websites suggest different things. It would be good to provide clarity on this.

      Examples of questions on which I could not find clarification:
      1) Does groupby() preserve order?
      2) Does take() preserve order?
      3) Is DataFrame guaranteed to have the same order of lines as the text file it was read from? (Or as the json file, etc.)

      Attachments

        Activity

          People

            Unassigned Unassigned
            mmoroz Max Moroz
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: