Currently the hash join implementation relies on recursively repartitioning the build side until a single partition can fit entirely in memory. This works well in many cases, but can fail if there are a large number of rows with duplicate keys that does not fit in the available memory.
This results in an error like: "Cannot perform hash join at node with id 6. Repartitioning did not reduce the size of a spilled partition. Repartitioning level 6. Number of rows 275352"
A special case of this is a Null-aware anti join with many NULLs on the build side.
This error often occurs because of a suboptimal query or plan that has a lot of duplicate values on one side of the join. Changing the join operator to spill in many of these cases would result in the query running to completion, but very slowly (since it needs to do a quadratic pairwise comparison of both sides of the join).