Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1611

Crash in BitmapReader when length is zero

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.7.1
    • 0.7.1
    • C++
    • Mac OS X 10.11.6

    Description

      This was found when applying the fix for ARROW-1601 to parquet-cpp.

      BitmapReader can be called when the length is zero resulting in EXC_BAD_ACCESS when trying to access the first byte of bitmap.

      Call stack says BitmapWriter because I added a BitmapWriter class to fix the same pattern as the INIT_BITSET/READ_NEXT_BITSET code for writing bitmaps in DefinitionLevelsToBitmap (parquet-cpp/src/parquet/column_reader.h). The constructors are the same so the compiler merged them.

      Old pull request (close):
      https://github.com/apache/arrow/pull/1131

      New pull request with suggested changes:
      https://github.com/apache/arrow/pull/1133

      Process 17313 launched: './bin/FileConvert' (x86_64)
      Input files are:
      ../../parquet-data/State_Drug_Utilization_Data_2016.csv
      Processing input file: ../../parquet-data/State_Drug_Utilization_Data_2016.csv
      Process 17313 stopped

      • thread #1: tid = 0x4be842, 0x0000000101840fe9 libparquet.1.dylib`arrow::internal::BitmapWriter::BitmapWriter(this=0x00007fff5fbf2908, bitmap="}", start_offset=1048576, length=0) + 89 at bit-util.h:99, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x106ba0000)
        frame #0: 0x0000000101840fe9 libparquet.1.dylib`arrow::internal::BitmapWriter::BitmapWriter(this=0x00007fff5fbf2908, bitmap="}", start_offset=1048576, length=0) + 89 at bit-util.h:99
        96 : bitmap_(bitmap), position_(0), length_(length) { 97 byte_offset_ = start_offset / 8; 98 bit_offset_ = start_offset % 8; -> 99 current_byte_ = bitmap[byte_offset_]; 100 }

        101
        102 void Set()

        { current_byte_ |= (1 << bit_offset_); }

        (lldb) thread backtrace

      • thread #1: tid = 0x4be842, 0x0000000101840fe9 libparquet.1.dylib`arrow::internal::BitmapWriter::BitmapWriter(this=0x00007fff5fbf2908, bitmap="}", start_offset=1048576, length=0) + 89 at bit-util.h:99, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x106ba0000)
      • frame #0: 0x0000000101840fe9 libparquet.1.dylib`arrow::internal::BitmapWriter::BitmapWriter(this=0x00007fff5fbf2908, bitmap="}", start_offset=1048576, length=0) + 89 at bit-util.h:99
        frame #1: 0x0000000101840ded libparquet.1.dylib`arrow::internal::BitmapWriter::BitmapWriter(this=0x00007fff5fbf2908, bitmap="}", start_offset=1048576, length=0) + 45 at bit-util.h:96
        frame #2: 0x0000000101964bf3 libparquet.1.dylib`parquet::Encoder<parquet::DataType<(parquet::Type::type)4> >::PutSpaced(this=0x0000000109b08bb0, src=0x000000012b86b000, num_values=0, valid_bits="}", valid_bits_offset=1048576) + 1747 at encoding.h:62
        frame #3: 0x0000000101931913 libparquet.1.dylib`parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)4> >::WriteValuesSpaced(this=0x0000000109b08cb8, num_values=0, valid_bits="}", valid_bits_offset=1048576, values=0x000000012b86b000) + 115 at column_writer.cc:612

      To reproduce this problem:

      1) Download the CSV file.
      Source: https://catalog.data.gov/dataset?res_format=CSV
      State Drug Utilization Data 2016
      https://data.medicaid.gov/api/views/3v6v-qk5s/rows.csv?accessType=DOWNLOAD

      2) Run FileConvert (see https://github.com/renesugar/FileConvert)
      ./bin/FileConvert -i ./State_Drug_Utilization_Data_2016.csv -o ./State_Drug_Utilization_Data_2016.parquet

      (FileConvert is built using the same process as MapD.)

      Attachments

        Activity

          People

            renesugar Rene Sugar
            renesugar Rene Sugar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: