Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.12.2
-
None
Description
parquet-cli's dictionary command fails with an NPE if a page does not have dictionary encoding:
$ parquet dictionary --column col a-b-c.snappy.parquet Unknown error java.lang.NullPointerException: Cannot invoke "org.apache.parquet.column.page.DictionaryPage.getEncoding()" because "page" is null at org.apache.parquet.cli.commands.ShowDictionaryCommand.run(ShowDictionaryCommand.java:78) at org.apache.parquet.cli.Main.run(Main.java:155) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.parquet.cli.Main.main(Main.java:185) $ parquet meta a-b-c.snappy.parquet ... Row group 0: count: 1 46.00 B records start: 4 total: 46 B -------------------------------------------------------------------------------- type encodings count avg size nulls min / max col BINARY S _ 1 46.00 B 0 "a" / "a" Row group 1: count: 200 0.34 B records start: 50 total: 69 B -------------------------------------------------------------------------------- type encodings count avg size nulls min / max col BINARY S _ R 200 0.34 B 0 "b" / "c"
(Note the missing R / dictionary encoding on that first page.)
Someone familiar with Parquet might guess from the NPE that there's no dictionary encoding. But for files that mix pages with and without dictionary encoding (like above), the command will fail before getting to pages that actually have dictionaries.
The problem is that this line assumes readDictionaryPage always returns a page and doesn't handle when it does not, i.e. when it returns null.
Attachments
Issue Links
- is depended upon by
-
PARQUET-2145 Release 1.12.3
- Resolved