Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-10920

Investigate python hash libraries

Details

    • Improvement
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • dependencies, sdk-py-core
    • None

    Description

      stats.ApproximateUnique has an optional mmh3 dependency [1] (mmh3 is roughly 9xs faster than md5), but if that repository is problematic for users, we should look into alternatives.

      Other options: sklearn.utils.murmurhash3_32

        [1]https://github.com/hajimes/mmh3, https://pypi.org/project/mmh3/2.0/

       

      cc: tvalentyn

      Attachments

        Activity

          People

            Unassigned Unassigned
            monicadsong Monica Song
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: