Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 1.4
-
None
-
None
Description
Read some code, looks like a bug in code:
hdfs-parquet-table-writer.cc inline int HdfsParquetTableWriter::BaseColumnWriter::AppendRow(TupleRow* row) { int bytes_added = 0; ++num_values_; void* value = expr_->GetValue(row); if (current_page_ == NULL) NewPage(); // We might need to try again if this current page is not big enough while (true) { if (!def_levels_->Put(value != NULL)) { bytes_added += FinalizeCurrentPage(); NewPage(); bool ret = def_levels_->Put(value != NULL); DCHECK(ret); } // Nulls don't get encoded. if (value == NULL) break; ++current_page_->num_non_null; if (EncodeValue(value, &bytes_added)) break; // Value didn't fit on page, try again on a new page. bytes_added += FinalizeCurrentPage(); NewPage(); } ++current_page_->header.data_page_header.num_values; return bytes_added; }
When value size is large than page size, the loop will loop forever, and allocating 64K pages..
Our user imported a table from mr generated parquet which has string length > 64K(should read documents first, string max length = 32767...)
Anyway the code there should mark the query as failed, rather than crash the server.
Thanks,
Binglin