Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4345

Hive Native Reader reporting wrong results for timestamp column in hive generated parquet file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      git.commit.id.abbrev=1b96174

      Below you can see different results returned from hive plugin and native reader for the same table.

      0: jdbc:drill:zk=10.10.100.190:5181> use hive;
      +-------+-----------------------------------+
      |  ok   |              summary              |
      +-------+-----------------------------------+
      | true  | Default schema changed to [hive]  |
      +-------+-----------------------------------+
      1 row selected (0.415 seconds)
      0: jdbc:drill:zk=10.10.100.190:5181> select int_col, timestamp_col from hive1_fewtypes_null_parquet;
      +----------+------------------------+
      | int_col  |     timestamp_col      |
      +----------+------------------------+
      | 1        | null                   |
      | null     | 1997-01-02 00:00:00.0  |
      | 3        | null                   |
      | 4        | null                   |
      | 5        | 1997-02-10 17:32:00.0  |
      | 6        | 1997-02-11 17:32:01.0  |
      | 7        | 1997-02-12 17:32:01.0  |
      | 8        | 1997-02-13 17:32:01.0  |
      | 9        | null                   |
      | 10       | 1997-02-15 17:32:01.0  |
      | null     | 1997-02-16 17:32:01.0  |
      | 12       | 1897-02-18 17:32:01.0  |
      | 13       | 2002-02-14 17:32:01.0  |
      | 14       | 1991-02-10 17:32:01.0  |
      | 15       | 1900-02-16 17:32:01.0  |
      | 16       | null                   |
      | null     | 1897-02-16 17:32:01.0  |
      | 18       | 1997-02-16 17:32:01.0  |
      | null     | null                   |
      | 20       | 1996-02-28 17:32:01.0  |
      | null     | null                   |
      +----------+------------------------+
      21 rows selected (0.368 seconds)
      0: jdbc:drill:zk=10.10.100.190:5181> alter session set `store.hive.optimize_scan_with_native_readers` = true;
      +-------+--------------------------------------------------------+
      |  ok   |                        summary                         |
      +-------+--------------------------------------------------------+
      | true  | store.hive.optimize_scan_with_native_readers updated.  |
      +-------+--------------------------------------------------------+
      1 row selected (0.213 seconds)
      0: jdbc:drill:zk=10.10.100.190:5181> select int_col, timestamp_col from hive1_fewtypes_null_parquet;
      +----------+------------------------+
      | int_col  |     timestamp_col      |
      +----------+------------------------+
      | 1        | null                   |
      | null     | 1997-01-02 00:00:00.0  |
      | 3        | 1997-02-10 17:32:00.0  |
      | 4        | null                   |
      | 5        | 1997-02-11 17:32:01.0  |
      | 6        | 1997-02-12 17:32:01.0  |
      | 7        | 1997-02-13 17:32:01.0  |
      | 8        | 1997-02-15 17:32:01.0  |
      | 9        | 1997-02-16 17:32:01.0  |
      | 10       | 1900-02-16 17:32:01.0  |
      | null     | 1897-02-16 17:32:01.0  |
      | 12       | 1997-02-16 17:32:01.0  |
      | 13       | 1996-02-28 17:32:01.0  |
      | 14       | 1997-01-02 00:00:00.0  |
      | 15       | 1997-01-02 00:00:00.0  |
      | 16       | 1997-01-02 00:00:00.0  |
      | null     | 1997-01-02 00:00:00.0  |
      | 18       | 1997-01-02 00:00:00.0  |
      | null     | 1997-01-02 00:00:00.0  |
      | 20       | 1997-01-02 00:00:00.0  |
      | null     | 1997-01-02 00:00:00.0  |
      +----------+------------------------+
      21 rows selected (0.352 seconds)
      

      DDL for hive table :

      create external table hive1_fewtypes_null_parquet (
            int_col int,
            bigint_col bigint,
            date_col string,
            time_col string,
            timestamp_col timestamp,
            interval_col string,
            varchar_col string,
            float_col float,
            double_col double,
            bool_col boolean
          )
      stored as parquet
      location '/drill/testdata/hive_storage/hive1_fewtypes_null';
      

      Attached the underlying parquet file

      Attachments

        1. hive1_fewtypes_null.parquet
          3 kB
          Rahul Kumar Challapalli

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rkins Rahul Kumar Challapalli
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: