Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1882

[C++] Writing an all-null column and then reading it with buffered_stream aborts the process

    XMLWordPrintableJSON

Details

    Description

      When writing a column unbuffered that contains only nulls, a 0-byte dictionary page gets written. When then reading the resulting file with buffered_stream enabled, the column reader gets the length of the page (which is 0), and then tries to read that many bytes from the underlying input stream.

      parquet/column_reader.cc, SerializedPageReader::NextPage

       

      int compressed_len = current_page_header_.compressed_page_size;
      int uncompressed_len = current_page_header_.uncompressed_page_size;
      
      // Read the compressed data page.
      std::shared_ptr<Buffer> page_buffer;
      PARQUET_THROW_NOT_OK(stream_->Read(compressed_len, &page_buffer));

       

      BufferedInputStream::Read, however, has an assertion that the bytes to read is strictly positive, so the assertion fails and aborts the process.

      arrow/io/buffered.cc, BufferedInputStream::Impl

       

      Status Read(int64_t nbytes, int64_t* bytes_read, void* out) {        
        ARROW_CHECK_GT(nbytes, 0);
      

       

       

      Attachments

        Issue Links

          Activity

            People

              emkornfield@gmail.com Micah Kornfield
              egorelik Eric Gorelik
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m