Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42691

Implement Dataset.semanticHash

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.1
    • Connect
    • None

    Description

      Implement Dataset.semanticHash:

      /**
      * Returns a `hashCode` of the logical query plan against this [[Dataset]].
      *
      * @note Unlike the standard `hashCode`, the hash is calculated against the query plan
      * simplified by tolerating the cosmetic differences such as attribute names.
      * @since 3.4.0
      */
      @DeveloperApi
      def semanticHash(): Int

      This has to be computed on the spark connect server to do this. Please extend the 

      AnalyzePlanRequest and AnalyzePlanResponse messages for this.

      Also make sure this works in PySpark.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              hvanhovell Herman van Hövell
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: