[FLINK-32532] exit code 137 (i.e. OutOfMemoryError) in flink-s3-fs-hadoop module - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Duplicate
Affects Version/s: 1.16.3
Fix Version/s: None
Component/s: Connectors / Hadoop Compatibility
Labels:
- test-stability

Description

This build https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=50840&view=logs&j=4eda0b4a-bd0d-521a-0916-8285b9be9bb5&t=2ff6d5fa-53a6-53ac-bff7-fa524ea361a9&l=16093

is failing like

Jul 03 15:33:35 [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.267 s - in org.apache.flink.fs.s3hadoop.HadoopS3FileSystemITCase
Jul 03 15:33:45 [ERROR] Picked up JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError
##[error]Exit code 137 returned from process: file name '/bin/docker', arguments 'exec -i -u 1000  -w /home/agent01_azpcontainer 3e9ac5dd969222db5673644f5c729d323f624390f9dbc3238a1c99b1b3c4679b /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
Finishing: Test - connect_1

Attachments

Issue Links

duplicates

FLINK-18356 flink-table-planner Exit code 137 returned from process

Resolved

Activity

Ascending order - Click to sort in descending order

Matthias Pohl added a comment - 06/Jul/23 07:21 - edited

The failure you describe happened on agent AlibabaCI005-agent01 on Jul 03 at 15:33:45. I checked the CI builds you reported in ~~FLINK-18356~~. There is a 137 exit code CI failure (you reported it in this comment) in the flink-table module on AlibabaCI005-agent04 (i.e. same VM) on Jul 3 at 15:32:38.

The 137 OOM errors make all the JVM processes crash on the same machine. We've seen this in the past where there was always a CI build failing in flink-table involved. That brought us to the conclusion that ~~FLINK-18356~~ is the most likely reason for the OOM. Therefore, you might want to close this Jira issue as a duplicate of ~~FLINK-18356~~ (it's important to link the Jiras to make sure that we can trace back issues in case the OOM is not only caused by ~~FLINK-18356~~).

Matthias Pohl added a comment - 06/Jul/23 07:21 - edited The failure you describe happened on agent AlibabaCI005-agent01 on Jul 03 at 15:33:45. I checked the CI builds you reported in FLINK-18356 . There is a 137 exit code CI failure (you reported it in this comment ) in the flink-table module on AlibabaCI005-agent04 (i.e. same VM) on Jul 3 at 15:32:38. The 137 OOM errors make all the JVM processes crash on the same machine. We've seen this in the past where there was always a CI build failing in flink-table involved. That brought us to the conclusion that FLINK-18356 is the most likely reason for the OOM. Therefore, you might want to close this Jira issue as a duplicate of FLINK-18356 (it's important to link the Jiras to make sure that we can trace back issues in case the OOM is not only caused by FLINK-18356 ).

Sergey Nuyanzin added a comment - 06/Jul/23 09:09

thanks for looking here ,
yes you are right i will close it in favor of ~~FLINK-18356~~

Sergey Nuyanzin added a comment - 06/Jul/23 09:09 thanks for looking here , yes you are right i will close it in favor of FLINK-18356

People

Assignee:: Unassigned

Reporter:: Sergey Nuyanzin

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 04/Jul/23 11:21

Updated:: 06/Jul/23 09:09

Resolved:: 06/Jul/23 09:09

Flink