Details
-
Bug
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
2.3.1
-
None
-
None
Description
When we run SQL with "Full Outer Join" on large tables when there is data skew, we found it's quite easy to hit OOM. We once thought we hit https://issues.apache.org/jira/browse/SPARK-13450. But taking a look at fix in https://github.com/apache/spark/pull/16909, we found that PR hasn't handled the "Full Outer Join" case.
The root cause of the OOM is there are a lot of rows with the same key.
See below code:
private def findMatchingRows(matchingKey: InternalRow): Unit = { leftMatches.clear() rightMatches.clear() leftIndex = 0 rightIndex = 0 while (leftRowKey != null && keyOrdering.compare(leftRowKey, matchingKey) == 0) { leftMatches += leftRow.copy() advancedLeft() } while (rightRowKey != null && keyOrdering.compare(rightRowKey, matchingKey) == 0) { rightMatches += rightRow.copy() advancedRight() }
It seems we haven't limited the data added to leftMatches and rightMatches.