Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
6.0.0, 6.0.1
-
None
-
None
-
- `uname -a`:
Linux datalab2 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- `mamba list | grep -i "pyarrow\|tensorflow\|^python"`
pyarrow 6.0.0 py39hff6fa39_1_cpu conda-forge
python 3.9.7 hb7a2778_3_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-flatbuffers 1.12 pyhd8ed1ab_1 conda-forge
python-irodsclient 1.0.0 pyhd8ed1ab_0 conda-forge
python-rocksdb 0.7.0 py39h7fcd5f3_4 conda-forge
python_abi 3.9 2_cp39 conda-forge
tensorflow 2.6.2 cuda112py39h9333c2f_0 conda-forge
tensorflow-base 2.6.2 cuda112py39h7de589b_0 conda-forge
tensorflow-estimator 2.6.2 cuda112py39h9333c2f_0 conda-forge
tensorflow-gpu 2.6.2 cuda112py39h0bbbad9_0 conda-forge
- `uname -a`: Linux datalab2 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux - `mamba list | grep -i "pyarrow\|tensorflow\|^python"` pyarrow 6.0.0 py39hff6fa39_1_cpu conda-forge python 3.9.7 hb7a2778_3_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-flatbuffers 1.12 pyhd8ed1ab_1 conda-forge python-irodsclient 1.0.0 pyhd8ed1ab_0 conda-forge python-rocksdb 0.7.0 py39h7fcd5f3_4 conda-forge python_abi 3.9 2_cp39 conda-forge tensorflow 2.6.2 cuda112py39h9333c2f_0 conda-forge tensorflow-base 2.6.2 cuda112py39h7de589b_0 conda-forge tensorflow-estimator 2.6.2 cuda112py39h9333c2f_0 conda-forge tensorflow-gpu 2.6.2 cuda112py39h0bbbad9_0 conda-forge
Description
Hi, I am getting randomly the following error when first running inference with a Tensorflow model and then writing the result to a `.parquet` file:
Fatal error condition occurred in /home/conda/feedstock_root/build_artifacts/aws-c-io_1633633131324/work/source/event_loop.c:72: aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn, el_group, &thread_options) == AWS_OP_SUCCESS Exiting Application ################################################################################ Stack trace: ################################################################################ /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_backtrace_print+0x59) [0x7ffb14235f19] /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_fatal_assert+0x48) [0x7ffb14227098] /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0x10a43) [0x7ffb1406ea43] /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d) [0x7ffb14237fad] /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0xe35a) [0x7ffb1406c35a] /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d) [0x7ffb14237fad] /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-crt-cpp.so(_ZN3Aws3Crt2Io15ClientBootstrapD1Ev+0x3a) [0x7ffb142a2f5a] /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so(+0x5f570) [0x7ffb147fd570] /lib/x86_64-linux-gnu/libc.so.6(+0x49a27) [0x7ffb17f7da27] /lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7ffb17f7dbe0] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfa) [0x7ffb17f5b0ba] /home/<user>/miniconda3/envs/spliceai_env/bin/python3.9(+0x20aa51) [0x562576609a51] /bin/bash: line 1: 2341494 Aborted (core dumped)
My colleague ran into the same issue on Centos 8 while running the same job + same environment on SLURM, so I guess it could be some issue with tensorflow + pyarrow.
Also I found a github issue with multiple people running into the same issue:
https://github.com/huggingface/datasets/issues/3310
It would be very important to my lab that this bug gets resolved, as we cannot work with parquet any more. Unfortunately, we do not have the knowledge to fix it.
Attachments
Issue Links
- is related to
-
ARROW-17501 [C++] Fatal error condition occurred in aws_thread_launch
- Resolved