Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23358

MSCK REPAIR should remove all insignificant zeroes from partition values (for numeric datatypes) before creating the partitions

    XMLWordPrintableJSON

    Details

    • Target Version/s:

      Description

      For the following scenario
      1. Have partitioned data path as follows.
      hdfs://mycluster/datapath/t1/year=2020/month=03/day=10
      hdfs://mycluster/datapath/t1/year=2020/month=03/day=11
      2. create external table t1 (key int, value string) partitioned by (Year int, Month int, Day int) stored as orc location hdfs://mycluster/datapath/t1'';
      3. msck repair table t1;
      4. show partitions t1; 

      +----------------------------+
      |         partition          |
      +----------------------------+
      | year=2020/month=03/day=10  |
      | year=2020/month=03/day=11 |
      +----------------------------+
      

      5.show table extended like 't1' partition (Year=2020, Month=03, Day=11);
      will throw an error:

      Error: Error while compiling statement: FAILED: SemanticException [Error 10006]: Partition not found {year=2020, month=3, day=11} (state=42000,code=10006)
      

      When the partition directory are created without the extra zeroes, this works fine.

      hdfs://mycluster/datapath/t1/year=2020/month=3/day=10
      hdfs://mycluster/datapath/t1/year=2020/month=3/day=11
      

      This happens because while searching for partitions, hive strips the extra "0" in month key and then queries the metastore (partSpec="year=2020/month=3/day=10") which returns no rows.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                adeshrao Adesh Kumar Rao
                Reporter:
                adeshrao Adesh Kumar Rao
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m