Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31412 New Adaptive Query Execution in Spark SQL
  3. SPARK-29954

collect the runtime statistics of row count in map stage

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: Shuffle, Spark Core
    • Labels:
      None

      Description

      We need the row count info to more accurately estimate the data skew situation when too many duplicated data. This PR will collect the row count info in map stage.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Jk_Self Ke Jia
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: