Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7132

Metadata cache does not have correct min/max values for varchar and interval data types

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 1.14.0
    • 1.17.0
    • Metadata
    • None

    Description

      The parquet metadata cache does not have correct min/max values for varchar and interval data types.

      I have attached a parquet file. Here is what parquet tools shows for varchar:

      [varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 average: 67 total: 67 (raw data: 65 saving -3%)
      values: min: 1 max: 1 average: 1 total: 1
      uncompressed: min: 65 max: 65 average: 65 total: 65
      column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0

      Here is what the metadata cache file shows:

      "name" : [ "varchar_col" ],
      "minValue" : "aW9lZ2pOSkt2bmtk",
      "maxValue" : "aW9lZ2pOSkt2bmtk",
      "nulls" : 0

      Here is what parquet tools shows for interval:

      [interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 average: 52 total: 52 (raw data: 50 saving -4%)
      values: min: 1 max: 1 average: 1 total: 1
      uncompressed: min: 50 max: 50 average: 50 total: 50
      column values statistics: min: P18582D, max: P18582D, num_nulls: 0

      Here is what the metadata cache file shows:

      "name" : [ "interval_col" ],
      "minValue" : "UDE4NTgyRA==",
      "maxValue" : "UDE4NTgyRA==",
      "nulls" : 0

      Attachments

        1. 0_0_10.parquet
          2 kB
          Robert Hou

        Activity

          People

            Unassigned Unassigned
            rhou Robert Hou
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: