Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14593

Non-canonical integer partition columns do not work with IN operations

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.0.0
    • None
    • Metastore
    • None

    Description

      The below use-case no longer works (tested on a PostgresQL backed HMS using JDO as well as on a MySQL backed HMS with DirectSQL):

      CREATE TABLE foo (a STRING) PARTITIONED BY (b INT, c INT);
      ALTER TABLE foo ADD PARTITION (b='07', c='08');
      LOAD DATA LOCAL INPATH '/etc/hostname' INTO TABLE foo PARTITION(b='07', c='08');
      
      -- Does not work if you provide a string IN variable:
      
      SELECT a, c FROM foo WHERE b IN ('07');
      (No rows selected)
      
      -- Works if you provide it in integer forms or canonical integer strings:
      
      SELECT a, c FROM foo WHERE b IN (07);
      (1 row(s) selected)
      SELECT a, c FROM foo WHERE b IN (7);
      (1 row(s) selected)
      SELECT a, c FROM foo WHERE b IN ('7');
      (1 row(s) selected)
      

      This worked fine prior to HIVE-8099. The change of HIVE-8099 is inducing a double conversion on the partition column input, such that the IN GenericUDFIn now receives b's value as a column type converted canonical integer 7, as opposed to an as-is DB stored non-canonical value 07. Subsequently the GenericUDFIn again up-converts the b's value to match its argument's value types instead, making 7 (int) into a string "7". Then, "7" is compared against "07" which naturally never matches.

      As a regression, this breaks anyone upgrading pre-1.0 to 1.0 or higher.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              qwertymaniac Harsh J
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: