Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4018

[C++] RLE decoder may not big-endian compatible

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.11.1
    • 1.0.0
    • C++

    Description

      This issue was found by Coverity. The RleDecoder::NextCounts method has the following code to fetch the repeated literal in repeated runs:

          bool result =
              bit_reader_.GetAligned<T>(static_cast<int>(BitUtil::CeilDiv(bit_width_, 8)),
                                        reinterpret_cast<T*>(&current_value_));
      

      Coverity says this:

      Pointer "&this->current_value_" points to an object whose effective type is "unsigned long long" (64 bits, unsigned) but is dereferenced as a narrower "unsigned int" (32 bits, unsigned). This may lead to unexpected results depending on machine endianness.

      In addition, it's not obvious whether current_value_ also needs byte-swapping (presumably, at least in the Parquet file format, it's supposed to be stored in little-endian format in the RLE bitstream).

      Attachments

        Issue Links

          Activity

            People

              kiszk Kazuaki Ishizaki
              apitrou Antoine Pitrou
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h