Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-8421

Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to milliseconds before filtering

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.21.0
    • 1.21.1
    • Storage - Parquet
    • None

    Description

      When using Drill with parquet files where the timestamp columns are in microseconds, Drill converts the microsecond values to milliseconds when displayed. However, when using a timestamp column in WHERE clauses it looks like the original microsecond value is used instead of the adjusted millisecond value when filtering records.

      To Reproduce
      Assume a parquet file in a directory "Test" with a column timestampCol having the type org.apache.parquet.schema.OriginalType.TIMESTAMP_MICROS.

      Assume there are two records with the values 1673981999806149 and 1674759597743552, respectively, in that column (i.e. the UTC dates 2023-01-17T18:59:59.806149 and 2023-01-26T18:59:57.743552)

      1. Execute the query
        SELECT timestampCol FROM dfs.Test;
        The result includes both records, as expected.
      1. Execute the query
        SELECT timestampCol FROM dfs.Test WHERE timestampCol < TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')
        This produces an empty result although both records have a value less than the argument.
      1. Execute
        SELECT timestampCol FROM dfs.Test WHERE timestampCol > TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')
        The result includes both records although neither have a value greater than the argument.

      Expected behavior
      The query in 2) above should produce a result with both records, and the query in 3) should produce an empty result.

      Additional context
      Even timestamps long into the future produce results with both records, e.g.:
      SELECT timestampCol FROM dfs.Test WHERE timestampCol > TO_TIMESTAMP('2502-04-04 00:00:00', 'yyyy-MM-dd HH:mm:ss')

      Manually converting the timestamp column to milliseconds produces the expected result:
      SELECT timestampCol FROM dfs.Test WHERE TO_TIMESTAMP(CONVERT_FROM(CONVERT_TO(timestampCol, 'TIMESTAMP_EPOCH'), 'BIGINT')/1000) < TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')
      produces a result with both records.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              handmadecode Peter Franzen
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: