Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11518

[C++] [Parquet] Parquet reader crashes when reading boolean columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 6.0.0
    • C++

    Description

      Parquet file reader crashes while reading boolean columns in TypedColumnReaderImpl<DType>::Skip.

      The calculation of the buffer size in the code below is not correct as value_byte_size is 1 for booleans, and the same buffer is used for definition and repetition levels data which requires 2 bytes per value.

           // This will be enough scratch space to accommodate 16-bit levels or any
            // value type
            std::shared_ptr<ResizableBuffer> scratch = AllocateBuffer(
                this->pool_, batch_size * type_traits<DType::type_num>::value_byte_size);
      
            do {
              batch_size = std::min(batch_size, rows_to_skip);
              values_read =
                  ReadBatch(static_cast<int>(batch_size),
                            reinterpret_cast<int16_t*>(scratch->mutable_data()),
                            reinterpret_cast<int16_t*>(scratch->mutable_data()),
                            reinterpret_cast<T*>(scratch->mutable_data()), &values_read);
              rows_to_skip -= values_read;
            } while (values_read > 0 && rows_to_skip > 0);
      

      Attachments

        Issue Links

          Activity

            People

              aklochkov Andrey Klochkov
              aklochkov Andrey Klochkov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m