Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4371

Incorrect DCHECK-s in hdfs-parquet-table-writer

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.2.4
    • Fix Version/s: Impala 3.0
    • Component/s: Backend
    • Labels:
      None

      Description

      The following two DCHECK-s in hdfs-parquet-table-writer.cc seem to be invalid:

          // Last page might be empty
          if (page.header.data_page_header.num_values == 0) {
            DCHECK_EQ(page.header.compressed_page_size, 0);
            DCHECK_EQ(i, num_data_pages_ - 1);
            continue;
          }
      

      The first DCHECK means that if a page's size is 0 then it's compressed size is also 0. This, however, seems to be a false assumption, as the compressed output may include metadata, such as length or checksum.

      The GZIP compressor, for example, states that an input of 0 bytes requires 23 bytes when compressed. The Snappy compressor also mentions storing length information in the compressed output. The compressed size estimation in the LZ4 compressor also contains a constant part.

      The "Last page might be empty" comment and the second DCHECK also seems to be based on a false assumption. If a value doesn't fit on the current page, AppendRow creates a new, possibly bigger page and tries writing the data in the new page instead. This means that if the data is bigger than the page size, then the current page is finalized and a new page is added, even if the original page was empty. In other words, empty pages can occur in the middle of the pages_ array as well, not only at the end of it.

        Attachments

          Activity

            People

            • Assignee:
              zi Zoltan Ivanfi
              Reporter:
              zi Zoltan Ivanfi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: