Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.4.0
-
None
Description
Implement Dataset.semanticHash:
/**
* Returns a `hashCode` of the logical query plan against this [[Dataset]].
*
* @note Unlike the standard `hashCode`, the hash is calculated against the query plan
* simplified by tolerating the cosmetic differences such as attribute names.
* @since 3.4.0
*/
@DeveloperApi
def semanticHash(): Int
This has to be computed on the spark connect server to do this. Please extend the
AnalyzePlanRequest and AnalyzePlanResponse messages for this.
Also make sure this works in PySpark.
Attachments
Issue Links
- fixes
-
SPARK-41922 Implement DataFrame `semanticHash`
- Resolved
- links to