Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4184

Drill does not support Parquet DECIMAL values in variable length BINARY fields

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: None
    • Component/s: Storage - Parquet
    • Labels:
      None
    • Environment:

      Windows 7 Professional, Java 1.8.0_66

      Description

      Encoding a DECIMAL logical type in Parquet using the variable length BINARY primitive type is not supported by Drill as of versions 1.3.0 and 1.4.0. The problem first surfaces with the ClassCastException shown below, but fixing the immediate cause of the exception is not sufficient to support this combination (DECIMAL, BINARY) in a Parquet file.

      In Drill, DECIMAL is currently assumed to be INT32, INT64, INT96, or FIXED_LEN_BINARY_ARRAY. Are there any plans to support DECIMAL with variable length BINARY? Avro definitely supports encoding DECIMAL in variable length bytes (see https://avro.apache.org/docs/current/spec.html#Decimal), but this support in Parquet is less clear.

      Selecting on a BINARY DECIMAL field in a parquet file throws an exception as shown below (java.lang.ClassCastException: org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to org.apache.drill.exec.vector.VariableWidthVector). The successful query at bottom selected on a string field in the same file.

      0: jdbc:drill:zk=local> select count from dfs.`c:/dao/DBArchivePredictor/tenrows.parquet` where acct_no=70000020;
      org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet recor
      d reader.
      Message: Failure in setting up reader
      Parquet Metadata: ParquetMetaData{FileMetaData{schema: message sbi.acct_mstr {
      required binary ACCT_NO (DECIMAL(20,0));
      optional binary SF_NO (UTF8);
      optional binary LF_NO (UTF8);
      optional binary BRANCH_NO (DECIMAL(20,0));
      optional binary INTRO_CUST_NO (DECIMAL(20,0));
      optional binary INTRO_ACCT_NO (DECIMAL(20,0));
      optional binary INTRO_SIGN (UTF8);
      optional binary TYPE (UTF8);
      optional binary OPR_MODE (UTF8);
      optional binary CUR_ACCT_TYPE (UTF8);
      optional binary TITLE (UTF8);
      optional binary CORP_CUST_NO (DECIMAL(20,0));
      optional binary APLNDT (UTF8);
      optional binary OPNDT (UTF8);
      optional binary VERI_EMP_NO (DECIMAL(20,0));
      optional binary VERI_SIGN (UTF8);
      optional binary MANAGER_SIGN (UTF8);
      optional binary CURBAL (DECIMAL(8,2));
      optional binary STATUS (UTF8);
      }
      , metadata: {parquet.avro.schema={"type":"record","name":"acct_mstr","namespace"
      :"sbi","fields":[{"name":"ACCT_NO","type":{"type":"bytes","logicalType":"decimal
      ","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_co
      lumn_class":"java.math.BigDecimal","cv_connection":"oracle.jdbc.driver.T4CConnec
      tion","cv_currency":true,"cv_def_writable":false,"cv_nullable":0,"cv_precision":
      20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_s
      ubscript":1,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}},{"name":"SF_
      NO","type":["null",

      {"type":"string","cv_auto_incr":false,"cv_case_sensitive":tru e,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":fal se,"cv_nullable":1,"cv_precision":10,"cv_read_only":false,"cv_scale":0,"cv_searc hable":true,"cv_signed":true,"cv_subscript":2,"cv_type":12,"cv_typename":"VARCHA R2","cv_writable":true}

      ]},{"name":"LF_NO","type":["null",

      {"type":"string","cv_au to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":10,"cv_r ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript ":3,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}

      ]},{"name":"BRANCH_
      NO","type":["null",

      {"type":"bytes","logicalType":"decimal","precision":20,"scale ":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"java.math. BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullable":1,"cv_preci sion":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true ,"cv_subscript":4,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}

      ]},{"nam
      e":"INTRO_CUST_NO","type":["null",

      {"type":"bytes","logicalType":"decimal","preci sion":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_cla ss":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullab le":1,"cv_precision":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true," cv_signed":true,"cv_subscript":5,"cv_type":2,"cv_typename":"NUMBER","cv_writable ":true}

      ]},{"name":"INTRO_ACCT_NO","type":["null",

      {"type":"bytes","logicalType":" decimal","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false ,"cv_column_class":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":f alse,"cv_nullable":1,"cv_precision":20,"cv_read_only":false,"cv_scale":0,"cv_sea rchable":true,"cv_signed":true,"cv_subscript":6,"cv_type":2,"cv_typename":"NUMBE R","cv_writable":true}

      ]},{"name":"INTRO_SIGN","type":["null",

      {"type":"string","c v_auto_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String" ,"cv_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":1,"c v_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscr ipt":7,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}

      ]},{"name":"TYPE
      ","type":["null",

      {"type":"string","cv_auto_incr":false,"cv_case_sensitive":true, "cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":false ,"cv_nullable":1,"cv_precision":2,"cv_read_only":false,"cv_scale":0,"cv_searchab le":true,"cv_signed":true,"cv_subscript":8,"cv_type":12,"cv_typename":"VARCHAR2" ,"cv_writable":true}

      ]},{"name":"OPR_MODE","type":["null",

      {"type":"string","cv_au to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":2,"cv_re ad_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript" :9,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}

      ]},{"name":"CUR_ACCT
      _TYPE","type":["null",

      {"type":"string","cv_auto_incr":false,"cv_case_sensitive": true,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable": false,"cv_nullable":1,"cv_precision":4,"cv_read_only":false,"cv_scale":0,"cv_sea rchable":true,"cv_signed":true,"cv_subscript":10,"cv_type":12,"cv_typename":"VAR CHAR2","cv_writable":true}

      ]},{"name":"TITLE","type":["null",

      {"type":"string","cv _auto_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String", "cv_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":30,"c v_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscr ipt":11,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}

      ]},{"name":"COR
      P_CUST_NO","type":["null",

      {"type":"bytes","logicalType":"decimal","precision":20 ,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"jav a.math.BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullable":1,"c v_precision":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signe d":true,"cv_subscript":12,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}

      ]},{"name":"APLNDT","type":["null",

      {"type":"string","cv_auto_incr":false,"cv_cas e_sensitive":false,"cv_column_class":"java.sql.Timestamp","cv_currency":false,"c v_def_writable":false,"cv_nullable":1,"cv_precision":0,"cv_read_only":false,"cv_ scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript":13,"cv_type":93,"c v_typename":"DATE","cv_writable":true}

      ]},{"name":"OPNDT","type":["null",

      {"type": "string","cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"java. sql.Timestamp","cv_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_p recision":0,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":t rue,"cv_subscript":14,"cv_type":93,"cv_typename":"DATE","cv_writable":true}

      ]},{"
      name":"VERI_EMP_NO","type":["null",

      {"type":"bytes","logicalType":"decimal","prec ision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_cl ass":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nulla ble":1,"cv_precision":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true, "cv_signed":true,"cv_subscript":15,"cv_type":2,"cv_typename":"NUMBER","cv_writab le":true}

      ]},{"name":"VERI_SIGN","type":["null",

      {"type":"string","cv_auto_incr":f alse,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv_currency" :false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":1,"cv_read_only":f alse,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript":16,"cv_ty pe":12,"cv_typename":"VARCHAR2","cv_writable":true}

      ]},{"name":"MANAGER_SIGN","ty
      pe":["null",

      {"type":"string","cv_auto_incr":false,"cv_case_sensitive":true,"cv_c olumn_class":"java.lang.String","cv_currency":false,"cv_def_writable":false,"cv_ nullable":1,"cv_precision":1,"cv_read_only":false,"cv_scale":0,"cv_searchable":t rue,"cv_signed":true,"cv_subscript":17,"cv_type":12,"cv_typename":"VARCHAR2","cv _writable":true}

      ]},{"name":"CURBAL","type":["null",

      {"type":"bytes","logicalType" :"decimal","precision":8,"scale":2,"cv_auto_incr":false,"cv_case_sensitive":fals e,"cv_column_class":"java.math.BigDecimal","cv_currency":true,"cv_def_writable": false,"cv_nullable":1,"cv_precision":8,"cv_read_only":false,"cv_scale":2,"cv_sea rchable":true,"cv_signed":true,"cv_subscript":18,"cv_type":2,"cv_typename":"NUMB ER","cv_writable":true}

      ]},{"name":"STATUS","type":["null",

      {"type":"string","cv_a uto_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","c v_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":1,"cv_r ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript ":19,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}

      ]}]}}}, blocks: [B
      lockMetaData{10, 1281 [ColumnMetaData

      {SNAPPY [ACCT_NO] BINARY [BIT_PACKED, PLAI N], 4}

      , ColumnMetaData

      {SNAPPY [SF_NO] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY ], 88}

      , ColumnMetaData

      {SNAPPY [LF_NO] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY ], 163}

      , ColumnMetaData

      {SNAPPY [BRANCH_NO] BINARY [RLE, BIT_PACKED, PLAIN_DICTI ONARY], 241}

      , ColumnMetaData

      {SNAPPY [INTRO_CUST_NO] BINARY [RLE, BIT_PACKED, PL AIN_DICTIONARY], 298}

      , ColumnMetaData

      {SNAPPY [INTRO_ACCT_NO] BINARY [RLE, BIT_P ACKED, PLAIN_DICTIONARY], 364}

      , ColumnMetaData

      {SNAPPY [INTRO_SIGN] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 421}

      , ColumnMetaData

      {SNAPPY [TYPE] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 478}

      , ColumnMetaData

      {SNAPPY [OPR_MODE] BINARY [ RLE, BIT_PACKED, PLAIN_DICTIONARY], 538}

      , ColumnMetaData

      {SNAPPY [CUR_ACCT_TYPE] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 598}

      , ColumnMetaData

      {SNAPPY [TITLE] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 658}

      , ColumnMetaData

      {SNAPPY [CORP_ CUST_NO] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 736}

      , ColumnMetaData

      {SNAPP Y [APLNDT] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 802}

      , ColumnMetaData

      {SNA PPY [OPNDT] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 919}

      , ColumnMetaData

      {SN APPY [VERI_EMP_NO] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 1036}

      , ColumnMet
      aData

      {SNAPPY [VERI_SIGN] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 1093}

      , Col
      umnMetaData

      {SNAPPY [MANAGER_SIGN] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 1 150}

      , ColumnMetaData

      {SNAPPY [CURBAL] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY] , 1207}

      , ColumnMetaData

      {SNAPPY [STATUS] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONA RY], 1270}

      ]}]}
      at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader
      .handleAndRaise(ParquetRecordReader.java:346)
      at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader
      .setup(ParquetRecordReader.java:339)
      at org.apache.drill.exec.physical.impl.ScanBatch.<init>(ScanBatch.java:1
      01)
      at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(
      ParquetScanBatchCreator.java:168)
      at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(
      ParquetScanBatchCreator.java:56)
      at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
      eator.java:151)
      at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
      or.java:174)
      at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
      eator.java:131)
      at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
      or.java:174)
      at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
      eator.java:131)
      at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
      or.java:174)
      at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
      eator.java:131)
      at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
      or.java:174)
      at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
      eator.java:131)
      at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
      or.java:174)
      at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
      eator.java:131)
      at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
      or.java:174)
      at org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreat
      or.java:105)
      at org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.j
      ava:79)
      at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExec
      utor.java:230)
      at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable
      .java:38)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
      java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
      .java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.ClassCastException: org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to org.apache.drill.exec.vector.VariableWidthVector
      at org.apache.drill.exec.store.parquet.columnreaders.VarLengthValuesColu
      mn.<init>(VarLengthValuesColumn.java:44)
      at org.apache.drill.exec.store.parquet.columnreaders.VarLengthColumnRead
      ers$Decimal28Column.<init>(VarLengthColumnReaders.java:52)
      at org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory
      .getReader(ColumnReaderFactory.java:178)
      at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader
      .setup(ParquetRecordReader.java:319)
      ... 22 more
      Error: SYSTEM ERROR: ClassCastException: org.apache.drill.exec.vector.Decimal28S
      parseVector cannot be cast to org.apache.drill.exec.vector.VariableWidthVector

      Fragment 0:0

      [Error Id: 22bfa8dd-1129-4300-9449-409e96d6c800 on DaveOshinsky-PC.gp.cv.commvau
      lt.com:31010] (state=,code=0)
      0: jdbc:drill:zk=local> select count from dfs.`c:/dao/DBArchivePredictor/tenr
      ows.parquet` where opr_mode='JO';
      ---------

      EXPR$0

      ---------

      10

      ---------
      1 row selected (0.406 seconds)
      0: jdbc:drill:zk=local>

      The immediate cause of this exception is that Drill, in org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader, assumes that all BINARY values are encoded in VariableWidthVectors. For BINARY DECIMAL, this is not true, as for example Decimal28SparseVector is a FixedWidthVector, not a VariableWidthVector. The assumption that DECIMAL is not encoded in variable length BINARY is found in a number of other places in the Drill code, including:

      org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory only contains logic to handle DECIMAL with INT32, INT64, INT96, or FIXED_LEN_BYTE_ARRAY. BINARY is not supported with DECIMAL.

      org.apache.drill.exec.store.parquet.columnreaders.NullableFixedByteAlignedReaders does not support a nullable reader for BINARY in getNullableColumnReader method.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                daveoshinsky Dave Oshinsky
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: