Description
SparkSQLOperationManager uses RDD.toLocalIterator to collect the result set one partition at a time. This is useful to avoid OOM when the result is large, but introduces extra job scheduling costs as each partition is collected with a separate job. Users may want to disable this when the result set is expected to be small.
UPDATE Incremental collection hurts performance because tasks of the last stage of the RDD DAG generated from the SQL query plan are executed sequentially. Thus we decided to disable it by default.
Attachments
Issue Links
- is duplicated by
-
SPARK-2591 Add config property to disable incremental collection used in Thrift server
- Resolved
- links to