Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
Windows 10 64-bit, MSVC
Description
When writing a column unbuffered that contains only nulls, a 0-byte dictionary page gets written. When then reading the resulting file with buffered_stream enabled, the column reader gets the length of the page (which is 0), and then tries to read that many bytes from the underlying input stream.
parquet/column_reader.cc, SerializedPageReader::NextPage
int compressed_len = current_page_header_.compressed_page_size; int uncompressed_len = current_page_header_.uncompressed_page_size; // Read the compressed data page. std::shared_ptr<Buffer> page_buffer; PARQUET_THROW_NOT_OK(stream_->Read(compressed_len, &page_buffer));
BufferedInputStream::Read, however, has an assertion that the bytes to read is strictly positive, so the assertion fails and aborts the process.
arrow/io/buffered.cc, BufferedInputStream::Impl
Status Read(int64_t nbytes, int64_t* bytes_read, void* out) { ARROW_CHECK_GT(nbytes, 0);
Attachments
Issue Links
- links to