Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5971

Fix INT64, INT32 logical types in complex parquet reader

    Details

    • Type: Bug
    • Status: Reviewable
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.11.0
    • Fix Version/s: 1.13.0
    • Component/s: None
    • Labels:
      None

      Description

      The 'complex' Parquet reader does not recognize the Parquet logical types INT64, and INT32.
      Should be a simple change to add these logical types.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user parthchandra commented on a diff in the pull request:

          https://github.com/apache/drill/pull/1049#discussion_r153909807

          — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java —
          @@ -138,6 +138,8 @@
          return new ParquetFixedWidthDictionaryReaders.DictionaryBigIntReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (BigIntVector) v, schemaElement);
          — End diff –

          OK I'll try to add these. BTW, I realized that the test files that I added for the unit tests are not annotated, so I'll need to fix those as well!

          Show
          githubbot ASF GitHub Bot added a comment - Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1049#discussion_r153909807 — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java — @@ -138,6 +138,8 @@ return new ParquetFixedWidthDictionaryReaders.DictionaryBigIntReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (BigIntVector) v, schemaElement); — End diff – OK I'll try to add these. BTW, I realized that the test files that I added for the unit tests are not annotated, so I'll need to fix those as well!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/1049#discussion_r153739725

          — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java —
          @@ -138,6 +138,8 @@
          return new ParquetFixedWidthDictionaryReaders.DictionaryBigIntReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (BigIntVector) v, schemaElement);
          — End diff –

          `DATE` logical type also encoded as the `INT32` physical type [1], so could you please also add its support?

          [1] https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#date

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1049#discussion_r153739725 — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java — @@ -138,6 +138,8 @@ return new ParquetFixedWidthDictionaryReaders.DictionaryBigIntReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (BigIntVector) v, schemaElement); — End diff – `DATE` logical type also encoded as the `INT32` physical type [1] , so could you please also add its support? [1] https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#date
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/1049#discussion_r153735994

          — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java —
          @@ -138,6 +138,8 @@
          return new ParquetFixedWidthDictionaryReaders.DictionaryBigIntReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (BigIntVector) v, schemaElement);
          — End diff –

          This comment had to be placed in line [117](https://github.com/apache/drill/pull/1049/files?diff=unified#diff-4a7ec07122bfb16e4ff696af256f56dcR117), but I could not add it there.
          Should we also handle the case when `columnChunkMetaData.getType()` type is `INT32` and `convertedType` is `INT_32`?

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1049#discussion_r153735994 — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java — @@ -138,6 +138,8 @@ return new ParquetFixedWidthDictionaryReaders.DictionaryBigIntReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (BigIntVector) v, schemaElement); — End diff – This comment had to be placed in line [117] ( https://github.com/apache/drill/pull/1049/files?diff=unified#diff-4a7ec07122bfb16e4ff696af256f56dcR117 ), but I could not add it there. Should we also handle the case when `columnChunkMetaData.getType()` type is `INT32` and `convertedType` is `INT_32`?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user parthchandra commented on the issue:

          https://github.com/apache/drill/pull/1049

          Updated the PR to include all the remaining logical int types. While generating the test file for the unit test, found that some of the types were missing from the fast parquet reader so added those as well.

          Show
          githubbot ASF GitHub Bot added a comment - Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1049 Updated the PR to include all the remaining logical int types. While generating the test file for the unit test, found that some of the types were missing from the fast parquet reader so added those as well.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on the issue:

          https://github.com/apache/drill/pull/1049

          LGTM, +1. Except `INT_32` and `INT_64` parquet can contain other logical numeric types: `UINT_8`, `UINT_16`, `UINT_32`, `UINT_64`, `INT_8`, `INT_16`[1]. It would be great to add support for these logical types in this pull request or create another Jira for this.

          [1] https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#numeric-types

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on the issue: https://github.com/apache/drill/pull/1049 LGTM, +1. Except `INT_32` and `INT_64` parquet can contain other logical numeric types: `UINT_8`, `UINT_16`, `UINT_32`, `UINT_64`, `INT_8`, `INT_16` [1] . It would be great to add support for these logical types in this pull request or create another Jira for this. [1] https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#numeric-types
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user arina-ielchiieva commented on the issue:

          https://github.com/apache/drill/pull/1049

          @vvysotskyi please review.

          Show
          githubbot ASF GitHub Bot added a comment - Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1049 @vvysotskyi please review.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user parthchandra opened a pull request:

          https://github.com/apache/drill/pull/1049

          DRILL-5971: Fix INT64, INT32 logical types in complex parquet reader

          Added support for the INT64 and INT32 logical types in the complex parquet reader

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/parthchandra/drill DRILL-5971

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/drill/pull/1049.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #1049


          commit a27f5e4cb9a71d6ce3901c700fd69850dbdf22b9
          Author: Parth Chandra <parthc@apache.org>
          Date: 2017-08-29T18:20:47Z

          DRILL-5971: Fix INT64, INT32 logical types in complex parquet reader


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user parthchandra opened a pull request: https://github.com/apache/drill/pull/1049 DRILL-5971 : Fix INT64, INT32 logical types in complex parquet reader Added support for the INT64 and INT32 logical types in the complex parquet reader You can merge this pull request into a Git repository by running: $ git pull https://github.com/parthchandra/drill DRILL-5971 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1049.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1049 commit a27f5e4cb9a71d6ce3901c700fd69850dbdf22b9 Author: Parth Chandra <parthc@apache.org> Date: 2017-08-29T18:20:47Z DRILL-5971 : Fix INT64, INT32 logical types in complex parquet reader

            People

            • Assignee:
              parthc Parth Chandra
              Reporter:
              parthc Parth Chandra
              Reviewer:
              Volodymyr Vysotskyi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development