Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1535

[parquet-cli] dictionary command throw NPE when specified column isn't dictionary encoding

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.11.0
    • None
    • parquet-mr
    • None

    Description

      'dictionary' command of parquet-cli throw NPE when specified column isn't dictionary encoding.

      $ java -cp 'target/classes:target/dependency/*' org.apache.parquet.cli.Main dictionary /work/parquet-mr/data/test.parquet -c binary_field
      Unknown error
      java.lang.NullPointerException
              at org.apache.parquet.cli.commands.ShowDictionaryCommand.run(ShowDictionaryCommand.java:78)
              at org.apache.parquet.cli.Main.run(Main.java:147)
              at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
              at org.apache.parquet.cli.Main.main(Main.java:177)
      

      The schema of 'test.parquet' is following:

      $ java -cp 'target/classes:target/dependency/*' org.apache.parquet.cli.Main meta /work/parquet-mr/data/test.parquet
      
      File path:  /work/parquet-mr/data/test.parquet
      Created by: parquet-mr version 1.12.0-SNAPSHOT (build 1e62e2e2ca903d4109480bc87ceec1dc954b6c92)
      Properties:
        writer.model.name: example
      Schema:
      message test {
        required int32 int32_field;
        required int64 int64_field;
        required float float_field;
        required double double_field;
        required binary binary_field;
        required int64 timestamp_field (TIMESTAMP(MILLIS,true));
      }
      
      
      Row group 0:  count: 395  15.87 B records  start: 4  total: 6.120 kB
      --------------------------------------------------------------------------------
                       type      encodings count     avg size   nulls   min / max
      int32_field      INT32     _   D     395       0.20 B     0       "32" / "426"
      int64_field      INT64     _   D     395       0.20 B     0       "64" / "458"
      float_field      FLOAT     _   _     395       4.13 B     0       "1.0" / "395.0"
      double_field     DOUBLE    _   _     395       8.13 B     0       "2.0" / "396.0"
      binary_field     BINARY    _   D     395       2.98 B     0       "0x6162636465666768696A6B6..." / "0x6162636465666768696A6B6..."
      timestamp_field  INT64     _   D     395       0.23 B     0       "2018-11-04T12:41:15.123+0000" / "2018-11-04T12:47:49.123+0000"
      
      Row group 1:  count: 395  15.92 B records  start: 6271  total: 6.142 kB
      --------------------------------------------------------------------------------
                       type      encodings count     avg size   nulls   min / max
      int32_field      INT32     _   D     395       0.20 B     0       "427" / "821"
      int64_field      INT64     _   D     395       0.20 B     0       "459" / "853"
      float_field      FLOAT     _   _     395       4.13 B     0       "396.0" / "790.0"
      double_field     DOUBLE    _   _     395       8.13 B     0       "397.0" / "791.0"
      binary_field     BINARY    _   D     395       3.03 B     0       "0x6162636465666768696A6B6..." / "0x6162636465666768696A6B6..."
      timestamp_field  INT64     _   D     395       0.23 B     0       "2018-11-04T12:47:50.123+0000" / "2018-11-04T12:54:24.123+0000"
      
      Row group 2:  count: 234  16.53 B records  start: 12560  total: 3.777 kB
      --------------------------------------------------------------------------------
                       type      encodings count     avg size   nulls   min / max
      int32_field      INT32     _   D     234       0.17 B     0       "822" / "1055"
      int64_field      INT64     _   D     234       0.31 B     0       "854" / "1087"
      float_field      FLOAT     _   _     234       4.11 B     0       "791.0" / "1024.0"
      double_field     DOUBLE    _   _     234       8.21 B     0       "792.0" / "1025.0"
      binary_field     BINARY    _   D     234       3.38 B     0       "0x6162636465666768696A6B6..." / "0x6162636465666768696A6B6..."
      timestamp_field  INT64     _   D     234       0.35 B     0       "2018-11-04T12:54:25.123+0000" / "2018-11-04T12:58:18.123+0000"
      

      Attachments

        1. test.parquet
          20 kB
          Masayuki Takahashi

        Activity

          People

            Unassigned Unassigned
            masayuki038 Masayuki Takahashi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: