Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9445

Revert HIVE-5700 - enforce single date format for partition column storage

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0
    • 1.1.0
    • Metastore
    • None

    Description

      HIVE-5700 has the following issues:

      • HIVE-8730 - fails mysql upgrades
      • Does not upgrade all metadata, e.g. PARTITIONS.PART_NAME See comments in HIVE-5700.
      • Completely corrupts postgres, see below.

      With a postgres metastore on 0.12, I executed the following:

      CREATE TABLE HIVE5700_DATE_PARTED (line string) PARTITIONED BY (ddate date);
      CREATE TABLE HIVE5700_STRING_PARTED (line string) PARTITIONED BY (ddate string);
      
      ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='NOT_DATE');
      ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='20150121');
      ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='20150122');
      ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='2015-01-23');
      
      ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='NOT_DATE');
      ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='20150121');
      ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='20150122');
      ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='2015-01-23');
      
      LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_DATE_PARTED PARTITION (ddate='NOT_DATE');
      LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_DATE_PARTED PARTITION (ddate='20150121');
      LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_DATE_PARTED PARTITION (ddate='20150122');
      LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_DATE_PARTED PARTITION (ddate='2015-01-23');
      
      LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_STRING_PARTED PARTITION (ddate='NOT_DATE');
      LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_STRING_PARTED PARTITION (ddate='20150121');
      LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_STRING_PARTED PARTITION (ddate='20150122');
      LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_STRING_PARTED PARTITION (ddate='2015-01-23');
      
      hive> show partitions HIVE5700_DATE_PARTED;  
      OK
      ddate=20150121
      ddate=20150122
      ddate=2015-01-23
      ddate=NOT_DATE
      Time taken: 0.052 seconds, Fetched: 4 row(s)
      hive> show partitions HIVE5700_STRING_PARTED;
      OK
      ddate=20150121
      ddate=20150122
      ddate=2015-01-23
      ddate=NOT_DATE
      Time taken: 0.051 seconds, Fetched: 4 row(s)
      

      I then took a dump of the database named postgres-pre-upgrade.sql and the data in the dump looks good:

      [root@hive5700-1-1 ~]# egrep -A9 '^COPY "PARTITIONS"|^COPY "PARTITION_KEY_VALS"' postgres-pre-upgrade.sql 
      COPY "PARTITIONS" ("PART_ID", "CREATE_TIME", "LAST_ACCESS_TIME", "PART_NAME", "SD_ID", "TBL_ID") FROM stdin;
      3	1421943647	0	ddate=NOT_DATE	6	2
      4	1421943647	0	ddate=20150121	7	2
      5	1421943648	0	ddate=20150122	8	2
      6	1421943664	0	ddate=NOT_DATE	9	3
      7	1421943664	0	ddate=20150121	10	3
      8	1421943665	0	ddate=20150122	11	3
      9	1421943694	0	ddate=2015-01-23	12	2
      10	1421943695	0	ddate=2015-01-23	13	3
      \.
      --
      COPY "PARTITION_KEY_VALS" ("PART_ID", "PART_KEY_VAL", "INTEGER_IDX") FROM stdin;
      3	NOT_DATE	0
      4	20150121	0
      5	20150122	0
      6	NOT_DATE	0
      7	20150121	0
      8	20150122	0
      9	2015-01-23	0
      10	2015-01-23	0
      \.
      

      I then upgraded to 0.13 and subsequently upgraded the MS with the following command: schematool -dbType postgres -upgradeSchema -verbose

      The file postgres-post-upgrade.sql is the post-upgrade db dump. As you can see the data is completely corrupt.

      [root@hive5700-1-1 ~]# egrep -A9 '^COPY "PARTITIONS"|^COPY "PARTITION_KEY_VALS"' postgres-post-upgrade.sql 
      COPY "PARTITIONS" ("PART_ID", "CREATE_TIME", "LAST_ACCESS_TIME", "PART_NAME", "SD_ID", "TBL_ID") FROM stdin;
      3	1421943647	0	ddate=NOT_DATE	6	2
      4	1421943647	0	ddate=20150121	7	2
      5	1421943648	0	ddate=20150122	8	2
      6	1421943664	0	ddate=NOT_DATE	9	3
      7	1421943664	0	ddate=20150121	10	3
      8	1421943665	0	ddate=20150122	11	3
      9	1421943694	0	ddate=2015-01-23	12	2
      10	1421943695	0	ddate=2015-01-23	13	3
      \.
      --
      COPY "PARTITION_KEY_VALS" ("PART_ID", "PART_KEY_VAL", "INTEGER_IDX") FROM stdin;
      3	2015-01-23	0
      4	2015-01-23	0
      5	2015-01-23	0
      6	2015-01-23	0
      7	2015-01-23	0
      8	2015-01-23	0
      9	2015-01-23	0
      10	2015-01-23	0
      \.
      

      Same corruption from the HIVE CLI:

      hive> show partitions HIVE5700_DATE_PARTED;  
      OK
      ddate=20150121
      ddate=20150122
      ddate=2015-01-23
      ddate=NOT_DATE
      Time taken: 0.844 seconds, Fetched: 4 row(s)
      hive> show partitions HIVE5700_STRING_PARTED;
      OK
      ddate=20150121
      ddate=20150122
      ddate=2015-01-23
      ddate=NOT_DATE
      Time taken: 0.114 seconds, Fetched: 4 row(s)
      hive> select * from HIVE5700_STRING_PARTED;
      OK
      some data!	2015-01-23
      some data!	2015-01-23
      some data!	2015-01-23
      some data!	2015-01-23
      Time taken: 0.536 seconds, Fetched: 4 row(s)
      hive> select * from HIVE5700_DATE_PARTED;  
      OK
      some data!	2015-01-23
      some data!	2015-01-23
      some data!	2015-01-23
      some data!	2015-01-23
      Time taken: 0.145 seconds, Fetched: 4 row(s)
      

      Attachments

        1. HIVE-9445.1.patch
          14 kB
          Brock Noland
        2. HIVE-9445.1.patch
          14 kB
          Brock Noland

        Issue Links

          Activity

            People

              brocknoland Brock Noland
              brocknoland Brock Noland
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: