Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 1.2.4
-
None
-
None
Description
User reported that when performing INSERT INTO parquet_table SELECT * FROM <other table> caused Impala to run out of memory.
https://groups.google.com/a/cloudera.org/forum/#!topic/impala-user/T-aVMuwfKZs
It appears there might be a configuration issue that caused all reads to be done remotely, but even so Impala shouldn't use 50GB of memory to scan just few 100MB of data.
HDFS_SCAN_NODE (id=0) Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/39.32 MB 1:1/36.93 MB 2:2/77.70 MB 3:1/38.68 MB - AverageHdfsReadThreadConcurrency: 0.0 - AverageScannerThreadConcurrency: 5.0 - BytesRead: 202496708 - BytesReadLocal: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 9780203 - NumColumns: 0 - NumDisksAccessed: 0 - NumScannerThreadsStarted: 5 - PeakMemoryUsage: 2273965352 - PerReadThreadRawHdfsThroughput: 1439216203 - RowsRead: 243712 - RowsReturned: 166912 - RowsReturnedRate: 471541 - ScanRangesComplete: 0 - ScannerThreadsInvoluntaryContextSwitches: 0 - ScannerThreadsTotalWallClockTime: 0 - MaterializeTupleTime(*): 0 - ScannerThreadsSysTime: 0 - ScannerThreadsUserTime: 0 - ScannerThreadsVoluntaryContextSwitches: 0 - TotalRawHdfsReadTime(*): 140699297 - TotalReadThroughput: 3894167 - TotalTime: 353971091