Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
Parquet CPP has a DCHECK for a dictionary encoded page coming after a non-dictionary encoded page. This is bad because the DCHECK can be triggered by Parquet files that have a column that has a dictionary page, then a non-dictionary encoded page, then a page of dictionary encoded values(indices). Fuzzing found such a file. While this could be turned into an exception, I don't see anything in the Parquet specification that prohibits such an occurrence of pages.
This situation has brought up on the mailing list before(https://lists.apache.org/thread/3bzymmbxvmzj12km7cjz1150ndvy9bos) and it seems like this is valid but nobody is doing it.
In the PR that added this check(https://github.com/apache/parquet-cpp/pull/73) it was noted that the check is probably not needed.