Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26860

Appropriate rows in HMS datastore tables(SDS, SERDES, SERDE_PARAMS, SKEWED_COL_NAMES, SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are not deleted upon drop partition table with skewed columns

    XMLWordPrintableJSON

Details

    Description

      [Description]

      Appropriate rows in HMS backing datastore tables(SDS, SERDES, SERDE_PARAMS, SKEWED_COL_NAMES, SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are not deleted upon drop partition table with skewed columns due to class cast exception.

      Exception Stack:

      MetaStoreDirectSql.java:2690) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$2000(MetaStoreDirectSql.java:118) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql$9.run(MetaStoreDirectSql.java:2608) at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:74) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.dropPartitionsViaSqlFilter(MetaStoreDirectSql.java:2598) at org.apache.hadoop.hive.metastore.ObjectStore$6.getSqlResult(ObjectStore.java:3053)
      2022-12-30T10:29:12,490 DEBUG [HiveServer2-Background-Pool: Thread-292] metastore.ObjectStore: Full DirectSQL callstack for debugging (not an error)
      java.lang.ClassCastException: java.lang.Long cannot be cast to [Ljava.lang.Object;
          at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.dropStorageDescriptors(MetaStoreDirectSql.java:2724) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.dropPartitionsByPartitionIds(MetaStoreDirectSql.java:2690) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$2000(MetaStoreDirectSql.java:118) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.MetaStoreDirectSql$9.run(MetaStoreDirectSql.java:2608) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:74) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.dropPartitionsViaSqlFilter(MetaStoreDirectSql.java:2598) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.ObjectStore$6.getSqlResult(ObjectStore.java:3053) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.ObjectStore$6.getSqlResult(ObjectStore.java:3050) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:4352) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.ObjectStore.dropPartitionsInternal(ObjectStore.java:3061) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.ObjectStore.dropPartitions(ObjectStore.java:3040) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_292]
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_292]
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_292]
          at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
          at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at com.sun.proxy.$Proxy29.dropPartitions(Unknown Source) ~[?:?]
          at org.apache.hadoop.hive.metastore.HMSHandler.dropPartitionsAndGetLocations(HMSHandler.java:3174) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.HMSHandler.drop_table_core(HMSHandler.java:2951) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.HMSHandler.drop_table_with_environment_context(HMSHandler.java:3199) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.HMSHandler.drop_table_with_environment_context(HMSHandler.java:3187) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_292]
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_292]
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_292]
          at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
          at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:146) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at com.sun.proxy.$Proxy31.drop_table_with_environment_context(Unknown Source) ~[?:?]
          at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.drop_table_with_environment_context(HiveMetaStoreClient.java:4513) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.drop_table_with_environment_context(SessionHiveMetaStoreClient.java:198) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:1995) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:1937) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_292]
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_292]
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_292]
          at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
          at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:218) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at com.sun.proxy.$Proxy32.dropTable(Unknown Source) ~[?:?]
          at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:1458) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:1382) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.ql.ddl.table.drop.DropTableOperation.execute(DropTableOperation.java:112) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:370) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_292]
          at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_292]
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) ~[hadoop-common-3.1.0.jar:?]
          at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_292]
          at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_292]
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_292]
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_292]
          at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]

      [Steps to reproduce]

      1. Create stage table, load some data into stage table, create partition table with skewed columns and load data into that table from the stage table. partdata3 file is attached. It has 2 partitions.

      create database mydb;
      use mydb;
      
      create table stage(num int, name string, category string) row format delimited fields terminated by ',' stored as textfile;
      
      load data local inpath 'partdata3' into table stage;
      create table skewpart(num int, name string) partitioned by (category string) skewed by(num) on (3,4) stored as directories row format delimited fields terminated by ',' stored as textfile;
      insert into skewpart select * from stage;

       

      2. Verify warehouse directory table data is correct

      kvenureddy@192 category=fruit % ls   
      HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME    num=4
      kvenureddy@192 category=fruit % pwd
      /private/tmp/warehouse/external/mydb.db/skewpart/category=fruit
      kvenureddy@192 category=fruit % cd num=4 
      kvenureddy@192 num=4 % pwd
      /private/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4
      kvenureddy@192 num=4 % cat 000000_0 
      4,cherry
      kvenureddy@192 num=4 % cd ../
      kvenureddy@192 category=fruit % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME 
      kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 
      1,apple
      2,banana
      6,mango
      kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd../../
      zsh: no such file or directory: cd../../
      kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd ../../
      kvenureddy@192 skewpart % pwd
      /private/tmp/warehouse/external/mydb.db/skewpart
      kvenureddy@192 skewpart % ls
      category=fruit        category=vegetable
      kvenureddy@192 skewpart % cd category=vegetable 
      kvenureddy@192 category=vegetable % pwd
      /private/tmp/warehouse/external/mydb.db/skewpart/category=vegetable
      kvenureddy@192 category=vegetable % ls
      HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME    num=3
      kvenureddy@192 category=vegetable % cd num=3 
      kvenureddy@192 num=3 % cat 000000_0 
      3,carrot
      kvenureddy@192 num=3 % cd ../
      kvenureddy@192 category=vegetable % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME 
      kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 
      5,potato
      7,tomato
      kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % 

       

      3. Verify HMS backing datastore tables after creating and loading into partition+skewed table.

      Note: Tables having issue(SDS, SERDES, SERDE_PARAMS, SKEWED_COL_NAMES, SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are shown below.

      SD_ID=2 row is added during create table. SD_ID=3 and 4 rows were added because there are 2 partitions during data load.

      SDS
      SD_ID CD_ID INPUT_FORMAT IS_COMPRESSED IS_STOREDASSUBDIRECTORIES LOCATION NUM_BUCKETS OUTPUT_FORMAT SERDE_ID
      1 1 org.apache.hadoop.mapred.TextInputFormat 0 0 file:/tmp/warehouse/external/mydb.db/stage -1 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 1
      2 2 org.apache.hadoop.mapred.TextInputFormat 0 1 file:/tmp/warehouse/external/mydb.db/skewpart -1 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 2
      3 2 org.apache.hadoop.mapred.TextInputFormat 0 1 file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable -1 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 3
      4 2 org.apache.hadoop.mapred.TextInputFormat 0 1 file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit -1 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 4
      SERDES
      SERDE_ID NAME SLIB DESCRIPTION SERIALIZER_CLASS DESERIALIZER_CLASS SERDE_TYPE
      1   org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       0
      2   org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       0
      3   org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       0
      4   org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       0
      SERDE_PARAMS
      SERDE_ID PARAM_KEY PARAM_VALUE
      1 field.delim ,
      1 serialization.format ,
      2 field.delim ,
      2 serialization.format ,
      3 field.delim ,
      3 serialization.format ,
      4 field.delim ,
      4 serialization.format ,
      SKEWED_COL_NAMES
      SD_ID SKEWED_COL_NAME INTEGER_IDX
      2 num 0
      3 num 0
      4 num 0
      SKEWED_COL_VALUE_LOC_MAP
      SD_ID STRING_LIST_ID_KID LOCATION
      3 3 file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3
      3 14 file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3
      4 6 file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4
      4 11 file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4
      SKEWED_VALUES
      SD_ID_OID STRING_LIST_ID_EID INTEGER_IDX
      2 1 0
      2 2 1
      4 9 0
      4 10 1
      3 12 0
      3 13 1

       

      4. Drop skewpart table and verify HMS backing datastore tables again.

      Row with SD_ID=2 is deleted. But rows with SD_ID 3 and 4 are not deleted. They remain in the database forever.

      SDS
      SD_ID CD_ID INPUT_FORMAT IS_COMPRESSED IS_STOREDASSUBDIRECTORIES LOCATION NUM_BUCKETS OUTPUT_FORMAT SERDE_ID
      1 1 org.apache.hadoop.mapred.TextInputFormat 0 0 file:/tmp/warehouse/external/mydb.db/stage -1 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 1
      3 2 org.apache.hadoop.mapred.TextInputFormat 0 1 file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable -1 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 3
      4 2 org.apache.hadoop.mapred.TextInputFormat 0 1 file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit -1 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 4

      Row with SERDE_ID=2 is deleted. But rows with SERDE_ID 3 and 4 are not deleted since they are being reference from SDS table.

      SERDES
      SERDE_ID NAME SLIB DESCRIPTION SERIALIZER_CLASS DESERIALIZER_CLASS SERDE_TYPE
      1   org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       0
      3   org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       0
      4   org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       0
      SERDE_PARAMS
      SERDE_ID PARAM_KEY PARAM_VALUE
      1 field.delim ,
      1 serialization.format ,
      3 field.delim ,
      3 serialization.format ,
      4 field.delim ,
      4 serialization.format ,

      Rows in SKEWED tables which has reference to SD_ID=3 and 4 remains forever.

      SKEWED_COL_NAMES
      SD_ID SKEWED_COL_NAME INTEGER_IDX
      3 num 0
      4 num 0
      SKEWED_COL_VALUE_LOC_MAP
      SD_ID STRING_LIST_ID_KID LOCATION
      3 3 file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3
      3 14 file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3
      4 6 file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4
      4 11 file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4
      SKEWED_VALUES
      SD_ID_OID STRING_LIST_ID_EID INTEGER_IDX
      4 9 0
      4 10 1
      3 12 0
      3 13 1

      Attachments

        1. partdata3
          0.1 kB
          Venugopal Reddy K

        Issue Links

          Activity

            People

              VenuReddy Venugopal Reddy K
              VenuReddy Venugopal Reddy K
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m