Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13676

[C++] Coredump writing Arrow table to Parquet file

    XMLWordPrintableJSON

Details

    Description

      I'm suffering a random appeared coredump issue converting user data from Google Protobuf format to Apache Parquet file via Apache Arrow C++ project. The problem could be stable reproduced with ASAN check enabled for specified user data. The callstack from ASAN check is exactly same as the coredump callstack (posted in attachment file, compiled with apache-arrow-4.0.1 built without jemalloc).

      I made some initial investigations:

      1. The direct constructed Arrow table would trigger this issue. Clone it in different way would yield different result, despite all of them are equal via `table.Equals(other)` method. All of the tables `ValidateFull()` passed.
        1. Serialize then deserialize the table was safe.
        2. CombineChunks didn't help.
        3. Clone with TableBatchReader didn't help.
        4. CombineChunks or TableBatchReader cloning on deserialized table was still safe.
      2. Different environment would trigger this problem, I think the issue is not related to glibc
        1. Debian 8 + gcc 4.9.2
        2. Debian 9 + gcc 6.3.0
        3. Debian 11 + gcc 10.2.1
        4. Ubuntu 20.04 LTS + clang 12.0.1

      Reproducing this issue by https://github.com/hcoona/arrow/commit/8fa6cdb0c756c17ea3edc43b7b73c717823bda85

      Attachments

        1. callstack.txt
          20 kB
          Shuai Zhang

        Issue Links

          Activity

            People

              emkornfield Micah Kornfield
              HCOONa Shuai Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m