Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
The core of the problem is due to https://github.com/apache/parquet-cpp/blob/master/src/parquet/encodings/plain-encoding.h#L203
The bit packing happens for every Write(). However, the packing is done at the byte level. If the number of (1-bit) values are not a multiple of 8, it results in padding incorrect values (false for boolean).
To reproduce: src/parquet/column/column-writer-test.cc
using TestBooleanValuesWriter = TestPrimitiveWriter<BooleanType>; TEST_F(TestBooleanValuesWriter, AlternateBooleanValues) { this->SetUpSchema(Repetition::REQUIRED); auto writer = this->BuildWriter(); for (int i = 0; i < SMALL_SIZE; i++) { bool value = (i % 2 == 0) ? true : false; writer->WriteBatch(1, nullptr, nullptr, &value); } writer->Close(); this->ReadColumn(); for (int i = 0; i < SMALL_SIZE; i++) { ASSERT_EQ((i % 2 == 0) ? true : false, this->values_out_[i]) << i; } }
Attachments
Issue Links
- blocks
-
PARQUET-713 parquet-cpp 1.0.0 release
- Resolved