Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1157

Crash in HdfsParquetWriter when value size larger than page size

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 1.4
    • Impala 2.0
    • None
    • None

    Description

      Read some code, looks like a bug in code:

      hdfs-parquet-table-writer.cc
      
      inline int HdfsParquetTableWriter::BaseColumnWriter::AppendRow(TupleRow* row) {
        int bytes_added = 0;
        ++num_values_;
        void* value = expr_->GetValue(row);
        if (current_page_ == NULL) NewPage();
      
        // We might need to try again if this current page is not big enough
        while (true) {
          if (!def_levels_->Put(value != NULL)) {
            bytes_added += FinalizeCurrentPage();
            NewPage();
            bool ret = def_levels_->Put(value != NULL);
            DCHECK(ret);
          }
          // Nulls don't get encoded.
          if (value == NULL) break;
          ++current_page_->num_non_null;
      
          if (EncodeValue(value, &bytes_added)) break;
      
          // Value didn't fit on page, try again on a new page.
          bytes_added += FinalizeCurrentPage();
          NewPage();
        }
        ++current_page_->header.data_page_header.num_values;
        return bytes_added;
      }
      

      When value size is large than page size, the loop will loop forever, and allocating 64K pages..
      Our user imported a table from mr generated parquet which has string length > 64K(should read documents first, string max length = 32767...)
      Anyway the code there should mark the query as failed, rather than crash the server.

      Thanks,
      Binglin

      Attachments

        Activity

          People

            ippokratis Ippokratis Pandis
            ippokratis Ippokratis Pandis
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: