[IMPALA-858] Investigate memory consumption issues when performing remote reads - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: Impala 1.2.4
Fix Version/s: Product Backlog
Component/s: None
Labels:
None

Description

User reported that when performing INSERT INTO parquet_table SELECT * FROM <other table> caused Impala to run out of memory.

https://groups.google.com/a/cloudera.org/forum/#!topic/impala-user/T-aVMuwfKZs

It appears there might be a configuration issue that caused all reads to be done remotely, but even so Impala shouldn't use 50GB of memory to scan just few 100MB of data.

        HDFS_SCAN_NODE (id=0)
          Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/39.32 MB 1:1/36.93 MB 2:2/77.70 MB 3:1/38.68 MB 
          - AverageHdfsReadThreadConcurrency: 0.0
          - AverageScannerThreadConcurrency: 5.0
          - BytesRead: 202496708
          - BytesReadLocal: 0
          - BytesReadShortCircuit: 0
          - DecompressionTime: 9780203
          - NumColumns: 0
          - NumDisksAccessed: 0
          - NumScannerThreadsStarted: 5
          - PeakMemoryUsage: 2273965352
          - PerReadThreadRawHdfsThroughput: 1439216203
          - RowsRead: 243712
          - RowsReturned: 166912
          - RowsReturnedRate: 471541
          - ScanRangesComplete: 0
          - ScannerThreadsInvoluntaryContextSwitches: 0
          - ScannerThreadsTotalWallClockTime: 0
            - MaterializeTupleTime(*): 0
            - ScannerThreadsSysTime: 0
            - ScannerThreadsUserTime: 0
          - ScannerThreadsVoluntaryContextSwitches: 0
          - TotalRawHdfsReadTime(*): 140699297
          - TotalReadThroughput: 3894167
          - TotalTime: 353971091

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Lenni Kuff

Votes:: 3 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 08/Mar/14 01:03

Updated:: 05/Dec/14 08:03

Resolved:: 05/Dec/14 08:03