[SPARK-24698] In Pyspark's ML, an Identifiable's UID has 20 random characters rather than the 12 mentioned in the documentation. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Trivial
Resolution: Fixed
Affects Version/s: 2.3.1
Fix Version/s: 2.4.0
Component/s: ML
Labels:
- easyfix

Description

Hi.

In pyspark, an Identifiable object has a random ID assigned to help distinguish instances from each other. This ID is made by concatenating the name of the class with part of a Python's built-in UUID.

The docstring of the method (_randomUID()) that generates this ID says that 12 random characters are used from the Python UUID, but the code actually skips the first 12 characters. The hex representation of the UUID is 32 characters, so the last 20 characters are used.

Code can be found here, and also copied here for your viewing pleasure:

@classmethod
def _randomUID(cls):
    """
    Generate a unique unicode id for the object. The default implementation
    concatenates the class name, "_", and 12 random hex chars.
    """
    return unicode(cls.__name__ + "_" + uuid.uuid4().hex[12:])

Attachments

Issue Links

links to

[Github] Pull Request #21675 (mcteo)

Activity

People

Assignee:: Thomas Dunne

Reporter:: Thomas Dunne

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 29/Jun/18 21:57

Updated:: 12/Dec/22 18:10

Resolved:: 05/Jul/18 02:07