Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4686

parquet-reader doesn't know about INT96 columns

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.7.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Infrastructure
    • Labels:

      Description

      The parquet-reader tool doesn't know about INT96 columns (see timestamp_col):

      Schema:
      id  INT32
      bool_col  BOOLEAN
      tinyint_col  INT32
      smallint_col  INT32
      int_col  INT32
      bigint_col  INT64
      float_col  FLOAT
      double_col  DOUBLE
      date_string_col  BYTE_ARRAY
      string_col  BYTE_ARRAY
      timestamp_col  UNKNOWN
      year  INT32
      month  INT32
      

        Activity

        Hide
        lv Lars Volker added a comment -

        IMPALA-4686: Fix schema output for INT96 columns in parquet-reader tool

        Instead of manually mapping the types we can just look them up in the
        thrift map.

        Testing: I tested this change manually by compiling the tool and running
        it on a parquet file that had a INT96 column. Here is the relevant
        output:

        Schema:
        id INT32
        bool_col BOOLEAN
        tinyint_col INT32
        smallint_col INT32
        int_col INT32
        bigint_col INT64
        float_col FLOAT
        double_col DOUBLE
        date_string_col BYTE_ARRAY
        string_col BYTE_ARRAY
        timestamp_col INT96
        year INT32
        month INT32

        We only use this tool in one test currently, which calls it to make sure
        that a parquet-file can be parsed by it. This implies that we have tests
        that it compiles, but we don't make use of its output currently.

        Change-Id: I5d92f5556554c71461a93fe0d598bb69f91cce51
        Reviewed-on: http://gerrit.cloudera.org:8080/5536
        Reviewed-by: Lars Volker <lv@cloudera.com>
        Reviewed-by: Matthew Jacobs <mj@cloudera.com>
        Tested-by: Internal Jenkins

        Show
        lv Lars Volker added a comment - IMPALA-4686 : Fix schema output for INT96 columns in parquet-reader tool Instead of manually mapping the types we can just look them up in the thrift map. Testing: I tested this change manually by compiling the tool and running it on a parquet file that had a INT96 column. Here is the relevant output: Schema: id INT32 bool_col BOOLEAN tinyint_col INT32 smallint_col INT32 int_col INT32 bigint_col INT64 float_col FLOAT double_col DOUBLE date_string_col BYTE_ARRAY string_col BYTE_ARRAY timestamp_col INT96 year INT32 month INT32 We only use this tool in one test currently, which calls it to make sure that a parquet-file can be parsed by it. This implies that we have tests that it compiles, but we don't make use of its output currently. Change-Id: I5d92f5556554c71461a93fe0d598bb69f91cce51 Reviewed-on: http://gerrit.cloudera.org:8080/5536 Reviewed-by: Lars Volker <lv@cloudera.com> Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins
        Hide
        mikesbrown Michael Brown added a comment -

        Lars Volker Do you mind if we change this Jira issueType from New Feature to Task? Docs writers like John Russell rely heavily on "New Feature" to correspond to new Impala features, whereas this is a test/infra feature.

        Show
        mikesbrown Michael Brown added a comment - Lars Volker Do you mind if we change this Jira issueType from New Feature to Task? Docs writers like John Russell rely heavily on "New Feature" to correspond to new Impala features, whereas this is a test/infra feature.
        Hide
        lv Lars Volker added a comment -

        Not at all, changed it.

        Show
        lv Lars Volker added a comment - Not at all, changed it.

          People

          • Assignee:
            lv Lars Volker
            Reporter:
            lv Lars Volker
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development