Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20953

Add hash map metrics to aggregate and join

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: SQL
    • Labels:
      None

      Description

      It would be useful if we can identify hash map collision issues early on.

      We should add avg hash map probe metric to aggregate operator and hash join operator and report them. If the avg probe is greater than a specific (configurable) threshold, we should log an error at runtime.
      The primary classes to look at are UnsafeFixedWidthAggregationMap, HashAggregateExec, HashedRelation, HashJoin.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rxin Reynold Xin
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: