Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-764

[CPP] Parquet Writer does not write Boolean values correctly

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • cpp-1.0.0
    • None
    • None

    Description

      The core of the problem is due to https://github.com/apache/parquet-cpp/blob/master/src/parquet/encodings/plain-encoding.h#L203
      The bit packing happens for every Write(). However, the packing is done at the byte level. If the number of (1-bit) values are not a multiple of 8, it results in padding incorrect values (false for boolean).

      To reproduce: src/parquet/column/column-writer-test.cc

      using TestBooleanValuesWriter = TestPrimitiveWriter<BooleanType>;
      TEST_F(TestBooleanValuesWriter, AlternateBooleanValues) {
        this->SetUpSchema(Repetition::REQUIRED);
        auto writer = this->BuildWriter();
        for (int i = 0; i < SMALL_SIZE; i++) {
            bool value = (i % 2 == 0) ? true :  false;
            writer->WriteBatch(1, nullptr, nullptr, &value);
        }
        writer->Close();
        this->ReadColumn();
        for (int i = 0; i < SMALL_SIZE; i++) {
            ASSERT_EQ((i % 2 == 0) ? true :  false, this->values_out_[i]) << i;
        }
      }
      

      Attachments

        Issue Links

          Activity

            People

              uwe Uwe Korn
              mdeepak Deepak Majeti
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Slack

                  Issue deployment