Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25419 Parquet predicate pushdown improvement
  3. SPARK-24538

ByteArrayDecimalType support push down to parquet data sources

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.4.0
    • None
    • SQL
    • None

    Description

      Latest parquet support decimal type statistics. then we can push down to the data sources:

      LM-SHC-16502798:parquet-mr yumwang$ java -jar ./parquet-tools/target/parquet-tools-1.10.10-column-index-SNAPSHOT.jar meta /tmp/spark/parquet/decimal/part-00000-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet
      
      file:         file:/tmp/spark/parquet/decimal/part-00000-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet
      
      creator:      parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a)
      
      extra:        org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}},{"name":"d1","type":"decimal(9,0)","nullable":true,"metadata":{}},{"name":"d2","type":"decimal(9,2)","nullable":true,"metadata":{}},{"name":"d3","type":"decimal(18,0)","nullable":true,"metadata":{}},{"name":"d4","type":"decimal(18,4)","nullable":true,"metadata":{}},{"name":"d5","type":"decimal(38,0)","nullable":true,"metadata":{}},{"name":"d6","type":"decimal(38,18)","nullable":true,"metadata":{}}]}
      
      
      
      file schema:  spark_schema
      
      --------------------------------------------------------------------------------
      
      id:           REQUIRED INT64 R:0 D:0
      
      d1:           OPTIONAL INT32 O:DECIMAL R:0 D:1
      
      d2:           OPTIONAL INT32 O:DECIMAL R:0 D:1
      
      d3:           OPTIONAL INT64 O:DECIMAL R:0 D:1
      
      d4:           OPTIONAL INT64 O:DECIMAL R:0 D:1
      
      d5:           OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
      
      d6:           OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
      
      
      
      row group 1:  RC:241867 TS:15480513 OFFSET:4
      
      --------------------------------------------------------------------------------
      
      id:            INT64 SNAPPY DO:0 FPO:4 SZ:968154/1935071/2.00 VC:241867 ENC:BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]
      
      d1:            INT32 SNAPPY DO:0 FPO:968158 SZ:967555/967515/1.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]
      
      d2:            INT32 SNAPPY DO:0 FPO:1935713 SZ:967558/967515/1.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0.00, max: 241866.00, num_nulls: 0]
      
      d3:            INT64 SNAPPY DO:0 FPO:2903271 SZ:968866/1935047/2.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]
      
      d4:            INT64 SNAPPY DO:0 FPO:3872137 SZ:1247007/1935047/1.55 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0.0000, max: 241866.0000, num_nulls: 0]
      
      d5:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5119144 SZ:1266850/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]
      
      d6:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:6385994 SZ:2198910/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0E-18, max: 241866.000000000000000000, num_nulls: 0]
      
      
      
      row group 2:  RC:241867 TS:15480513 OFFSET:8584904
      
      --------------------------------------------------------------------------------
      
      id:            INT64 SNAPPY DO:0 FPO:8584904 SZ:968131/1935071/2.00 VC:241867 ENC:BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]
      
      d1:            INT32 SNAPPY DO:0 FPO:9553035 SZ:967563/967515/1.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]
      
      d2:            INT32 SNAPPY DO:0 FPO:10520598 SZ:967563/967515/1.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867.00, max: 483733.00, num_nulls: 0]
      
      d3:            INT64 SNAPPY DO:0 FPO:11488161 SZ:968110/1935047/2.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]
      
      d4:            INT64 SNAPPY DO:0 FPO:12456271 SZ:1247071/1935047/1.55 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867.0000, max: 483733.0000, num_nulls: 0]
      
      d5:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:13703342 SZ:1270587/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]
      
      d6:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:14973929 SZ:2197306/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867.000000000000000000, max: 483733.000000000000000000, num_nulls: 0]

      Attachments

        Issue Links

          Activity

            People

              yumwang Yuming Wang
              yumwang Yuming Wang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: