I'm suffering a random appeared coredump issue converting user data from Google Protobuf format to Apache Parquet file via Apache Arrow C++ project. The problem could be stable reproduced with ASAN check enabled for specified user data. The callstack from ASAN check is exactly same as the coredump callstack (posted in attachment file, compiled with apache-arrow-4.0.1 built without jemalloc).
I made some initial investigations:
- The direct constructed Arrow table would trigger this issue. Clone it in different way would yield different result, despite all of them are equal via `table.Equals(other)` method. All of the tables `ValidateFull()` passed.
- Serialize then deserialize the table was safe.
- CombineChunks didn't help.
- Clone with TableBatchReader didn't help.
- CombineChunks or TableBatchReader cloning on deserialized table was still safe.
- Different environment would trigger this problem, I think the issue is not related to glibc
- Debian 8 + gcc 4.9.2
- Debian 9 + gcc 6.3.0
- Debian 11 + gcc 10.2.1
- Ubuntu 20.04 LTS + clang 12.0.1
Reproducing this issue by https://github.com/hcoona/arrow/commit/8fa6cdb0c756c17ea3edc43b7b73c717823bda85