Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9717

The max/min function used by AggrStats for decimal type is not what we expected

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.14.0, 1.0.0, 1.1.0
    • 1.2.0
    • Statistics
    • None

    Description

      In current version hive-schema-1.2.0, in TABLE PART_COL_STATS, we store the "BIG_DECIMAL_LOW_VALUE" and "BIG_DECIMAL_HIGH_VALUE" as varchar. For example,

      derby
      "BIG_DECIMAL_LOW_VALUE" VARCHAR(4000), "BIG_DECIMAL_HIGH_VALUE" VARCHAR(4000)

      mssql
      BIG_DECIMAL_HIGH_VALUE varchar(255) NULL,
      BIG_DECIMAL_LOW_VALUE varchar(255) NULL,

      mysql
      `BIG_DECIMAL_LOW_VALUE` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin,
      `BIG_DECIMAL_HIGH_VALUE` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin,

      oracle
      BIG_DECIMAL_LOW_VALUE VARCHAR2(4000),
      BIG_DECIMAL_HIGH_VALUE VARCHAR2(4000),

      postgres
      "BIG_DECIMAL_LOW_VALUE" character varying(4000) DEFAULT NULL::character varying,
      "BIG_DECIMAL_HIGH_VALUE" character varying(4000) DEFAULT NULL::character varying,

      And, when we do the aggrstats, we do a MAX/MIN of all the BIG_DECIMAL_HIGH_VALU/BIG_DECIMAL_LOW_VALUEE of partitions. We are expecting a max/min of a decimal (a number). However, it is actually a max/min of a varchar (a string). As a result, '900' is more than '1000'. This also affects the extrapolation of the status. The proposed solution is to use a CAST function to cast it to decimal.

      Attachments

        Issue Links

          Activity

            People

              pxiong Pengcheng Xiong
              pxiong Pengcheng Xiong
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: