Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2120

parquet-cli dictionary command fails on pages without dictionary encoding

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.12.2
    • 1.12.3
    • parquet-cli
    • None

    Description

      parquet-cli's dictionary command fails with an NPE if a page does not have dictionary encoding:

      $ parquet dictionary --column col a-b-c.snappy.parquet                
      Unknown error
      java.lang.NullPointerException: Cannot invoke "org.apache.parquet.column.page.DictionaryPage.getEncoding()" because "page" is null
      	at org.apache.parquet.cli.commands.ShowDictionaryCommand.run(ShowDictionaryCommand.java:78)
      	at org.apache.parquet.cli.Main.run(Main.java:155)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
      	at org.apache.parquet.cli.Main.main(Main.java:185)
      
      $ parquet meta a-b-c.snappy.parquet      
      ...
      Row group 0:  count: 1  46.00 B records  start: 4  total: 46 B
      --------------------------------------------------------------------------------
           type      encodings count     avg size   nulls   min / max
      col  BINARY    S   _     1         46.00 B    0       "a" / "a"
      
      Row group 1:  count: 200  0.34 B records  start: 50  total: 69 B
      --------------------------------------------------------------------------------
           type      encodings count     avg size   nulls   min / max
      col  BINARY    S _ R     200       0.34 B     0       "b" / "c"
      

      (Note the missing R / dictionary encoding on that first page.)

      Someone familiar with Parquet might guess from the NPE that there's no dictionary encoding. But for files that mix pages with and without dictionary encoding (like above), the command will fail before getting to pages that actually have dictionaries.

      The problem is that this line assumes readDictionaryPage always returns a page and doesn't handle when it does not, i.e. when it returns null.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rshkv Willi Raschkowski
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: