Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 4.0.0
-
ghx-label-1
Description
Saw a data loading failure casued by OutOfMemoryError in a test with erasure coding. The impacted query is inserting data to the store_sales table and fails:
Getting log thread is interrupted, since query is done! ERROR : Status: Failed ERROR : Vertex failed, vertexName=Reducer 2, vertexId=vertex_1590450092775_0009_3_01, diagnostics=[Task failed, taskId=task_1590450092775_0009_3_01_000001, diagnostics=[TaskAttempt 0 failed, info=[Container container_1590450092775_0009_01_000003 finished with diagnostics set to [Container failed, exitCode=-104. [2020-05-25 16:49:18.814]Container [pid=14180,containerID=container_1590450092775_0009_01_000003] is running 44290048B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 3.3 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1590450092775_0009_01_000003 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 14180 14176 14180 14180 (bash) 0 0 115851264 352 /bin/bash -c /usr/java/jdk1.8.0_144/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_000003 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/jenkins/nm-local-dir/usercache/jenkins/appcache/application_1590450092775_0009/container_1590450092775_0009_01_000003/tmp org.apache.tez.runtime.task.TezChild localhost 43422 container_1590450092775_0009_01_000003 application_1590450092775_0009 1 1>/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_000003/stdout 2>/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_000003/stderr |- 14191 14180 14180 14180 (java) 3167 127 3468886016 272605 /usr/java/jdk1.8.0_144/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_000003 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/jenkins/nm-local-dir/usercache/jenkins/appcache/application_1590450092775_0009/container_1590450092775_0009_01_000003/tmp org.apache.tez.runtime.task.TezChild localhost 43422 container_1590450092775_0009_01_000003 application_1590450092775_0009 1 [2020-05-25 16:49:18.884]Container killed on request. Exit code is 143 [2020-05-25 16:49:18.887]Container exited with a non-zero exit code 143. ]], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:96) at org.apache.hadoop.hdfs.DFSStripedOutputStream$CellBuffers.<init>(DFSStripedOutputStream.java:223) at org.apache.hadoop.hdfs.DFSStripedOutputStream.<init>(DFSStripedOutputStream.java:315) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:310) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1218) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1197) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1135) at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:546) at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:543) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:557) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:484) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1153) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1039) at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter(HiveIgnoreKeyTextOutputFormat.java:81) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:294) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:279) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:840) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:786) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createNewPaths(FileSinkOperator.java:1170) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:1294) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1011) at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:490) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:392) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) , errorMessage=Cannot recover from this error:java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:96) at org.apache.hadoop.hdfs.DFSStripedOutputStream$CellBuffers.<init>(DFSStripedOutputStream.java:223) at org.apache.hadoop.hdfs.DFSStripedOutputStream.<init>(DFSStripedOutputStream.java:315) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:310) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1218) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1197) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1135) at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:546) at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:543) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:557) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:484) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1153) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1039) at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter(HiveIgnoreKeyTextOutputFormat.java:81) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:294) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:279) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:840) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:786) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createNewPaths(FileSinkOperator.java:1170) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:1294) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1011) at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:490) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:392) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:1, Vertex vertex_1590450092775_0009_3_01 [Reducer 2] killed/failed due to:OWN_TASK_FAILURE] ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 2, vertexId=vertex_1590450092775_0009_3_01, diagnostics=[Task failed, taskId=task_1590450092775_0009_3_01_000001, diagnostics=[TaskAttempt 0 failed, info=[Container container_1590450092775_0009_01_000003 finished with diagnostics set to [Container failed, exitCode=-104. [2020-05-25 16:49:18.814]Container [pid=14180,containerID=container_1590450092775_0009_01_000003] is running 44290048B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 3.3 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1590450092775_0009_01_000003 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 14180 14176 14180 14180 (bash) 0 0 115851264 352 /bin/bash -c /usr/java/jdk1.8.0_144/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_000003 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/jenkins/nm-local-dir/usercache/jenkins/appcache/application_1590450092775_0009/container_1590450092775_0009_01_000003/tmp org.apache.tez.runtime.task.TezChild localhost 43422 container_1590450092775_0009_01_000003 application_1590450092775_0009 1 1>/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_000003/stdout 2>/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_000003/stderr |- 14191 14180 14180 14180 (java) 3167 127 3468886016 272605 /usr/java/jdk1.8.0_144/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_000003 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/jenkins/nm-local-dir/usercache/jenkins/appcache/application_1590450092775_0009/container_1590450092775_0009_01_000003/tmp org.apache.tez.runtime.task.TezChild localhost 43422 container_1590450092775_0009_01_000003 application_1590450092775_0009 1 [2020-05-25 16:49:18.884]Container killed on request. Exit code is 143 [2020-05-25 16:49:18.887]Container exited with a non-zero exit code 143. ]], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:96) at org.apache.hadoop.hdfs.DFSStripedOutputStream$CellBuffers.<init>(DFSStripedOutputStream.java:223) at org.apache.hadoop.hdfs.DFSStripedOutputStream.<init>(DFSStripedOutputStream.java:315) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:310) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1218) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1197) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1135) at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:546) at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:543) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:557) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:484) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1153) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1039) at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter(HiveIgnoreKeyTextOutputFormat.java:81) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:294) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:279) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:840) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:786) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createNewPaths(FileSinkOperator.java:1170) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:1294) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1011) at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:490) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:392) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) , errorMessage=Cannot recover from this error:java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:96) at org.apache.hadoop.hdfs.DFSStripedOutputStream$CellBuffers.<init>(DFSStripedOutputStream.java:223) at org.apache.hadoop.hdfs.DFSStripedOutputStream.<init>(DFSStripedOutputStream.java:315) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:310) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1218) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1197) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1135) at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:546) at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:543) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:557) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:484) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1153) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1039) at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter(HiveIgnoreKeyTextOutputFormat.java:81) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:294) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:279) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:840) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:786) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createNewPaths(FileSinkOperator.java:1170) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:1294) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1011) at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:490) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:392) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:1, Vertex vertex_1590450092775_0009_3_01 [Reducer 2] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
We may need to allocate a larger heap size for Tez containers.
Attachments
Attachments
Issue Links
- is duplicated by
-
IMPALA-9806 Multiple data load failures on HDFS errors for erasure coding builds
- Resolved
- relates to
-
IMPALA-9777 Reduce the diskspace requirements of loading the text version of tpcds.store_sales
- Resolved