Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17833

'monotonicallyIncreasingId()' should be deterministic

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • None
    • None
    • SQL
    • None

    Description

      Right now, it's (IMHO) too easy to shoot yourself in the foot using 'monotonicallyIncreasingId()', as it's easy to expect the generated numbers to function as a 'stable' primary key, for example, and then go on to use that key in e.g. 'joins' and so on.

      Is there any reason why this function can't be made deterministic? Or, could a deterministic analogue of this function be added (e.g. 'withPrimaryKey(columnName = ...)')?

      A solution is to immediately cache / persist the table after calling 'monotonicallyIncreasingId()'; it's also possible that the documentation should spell that out loud and clear.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kevinushey Kevin Ushey
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: