[SPARK-18857] SparkSQL ThriftServer hangs while extracting huge data volumes in incremental collect mode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.2
Fix Version/s: 2.0.3, 2.1.1, 2.2.0
Component/s: SQL
Labels:
None

Description

We are trying to run a sql query on our spark cluster and extracting around 200 million records through SparkSQL ThriftServer interface. This query works fine for Spark 1.6.3 version, however for spark 2.0.2, thrift server hangs after fetching data from a few partitions (we are using incremental collect mode with 400 partitions). As per documentation max memory taken up by thrift server should be what is required by the biggest data partition. But we observed that Thrift server is not releasing the old partitions memory whenever the GC occurs even though it has moved to next partition data fetches. which is not the case with 1.6.3 version.

On further investigation we found that SparkExecuteStatementOperation.scala was modified for "~~SPARK-16563~~[SQL] fix spark sql thrift server FetchResults bug" and result set iterator was duplicated to keep a reference to the first set.

+ val (itra, itrb) = iter.duplicate
+ iterHeader = itra
+ iter = itrb

We suspect that this is resulting in the memory not being cleared on GC. To confirm this we created an iterator in our test class and fetched the data once without duplicating and second time with creating a duplicate. we could see that in first instance it ran fine and fetched the entire data set while in second instance driver hanged after fetching data from a few partitions.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

GC-spark-1.6.3
14/Dec/16 08:42
9 kB
vishal agrawal
GC-spark-2.0.2
14/Dec/16 08:42
7 kB
vishal agrawal

Issue Links

relates to

SPARK-16563 Repeat calling Spark SQL thrift server fetchResults return empty for ExecuteStatement operation

Resolved

links to

[Github] Pull Request #16440 (dongjoon-hyun)

Activity

People

Assignee:: Dongjoon Hyun

Reporter:: vishal agrawal

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 14/Dec/16 08:37

Updated:: 12/Jan/17 10:46

Resolved:: 10/Jan/17 13:28