Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30713

Respect mapOutputSize in memory in adaptive execution

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • SQL
    • None

    Description

      Currently, Spark adaptive execution use the MapOutputStatistics information to adjust the plan dynamically, but this MapOutputSize does not respect the compression factor. So there are cases that the original SparkPlan is `SortMergeJoin`, but the Plan after adaptive adjustment was changed to `BroadcastHashJoin`, but this `BroadcastHashJoin` might causing OOMs due to inaccurate estimation.

       

      Also, if the shuffle implementation is local shuffle(intel Spark-Adaptive execution impl), then in some cases, it will cause `Too large Frame` exception.

       

      So I propose to respect the compression factor in adaptive execution, or use `dataSize` metrics in `ShuffleExchangeExec` in adaptive execution.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              liupengcheng liupengcheng
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: