Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25904

ObjectStore's updateTableColumnStatistics is not ThreadSafe

    XMLWordPrintableJSON

Details

    Description

      [root@igansperger-hive-tgt-3 ~]# cat test.sh
      hive -e 'create database test; create external table test.foo(col1 string);' 2> /dev/null
      hive -e "select count(*) from sys.tab_col_stats where db_name = 'test' and table_name = 'foo'" 2> /dev/null
      
      export JAVA_HOME=/usr/java/jdk1.8.0_232-cloudera
      export JAVA_OPTS="-Xmx1g"
      export PATH="/root/scala-2.13.8/bin:$JAVA_HOME/bin:$PATH"
      
      export CONF_DIR=/run/cloudera-scm-agent/process/79-hive_on_tez-HIVESERVER2
      export CDH_HCAT_HOME=/opt/cloudera/parcels/CDH/lib/hive-hcatalog/
      export CDH_HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
      
      CLASSPATH="$CLASSPATH:$CONF_DIR/hadoop-conf"
      CLASSPATH="$CLASSPATH:$CONF_DIR/hive-conf"
      CLASSPATH="$CLASSPATH:$(hadoop classpath)"
      CLASSPATH="$CLASSPATH:$CDH_HIVE_HOME/*"
      CLASSPATH="$CLASSPATH:$CDH_HIVE_HOME/lib/*"
      CLASSPATH="$CLASSPATH:${CDH_HCAT_HOME}/share/webhcat/java-client/hive-webhcat-java-client.jar"
      CLASSPATH="$CLASSPATH:${CDH_HCAT_HOME}/share/hcatalog/hive-hcatalog-core.jar"
      
      scala -classpath $CLASSPATH <<-EOF
      
      import org.apache.hadoop.hive.metastore.HiveMetaStoreClient
      import org.apache.hadoop.hive.conf.HiveConf
      import org.apache.hadoop.hive.metastore.api._
      
      def go() = {
          val conf = new HiveConf()
          val client = new HiveMetaStoreClient(conf)
      
          val colStatData = new ColumnStatisticsData()
          colStatData.setStringStats(new StringColumnStatsData(3, 3.0, 0, 1))
          val colStatsObj = new ColumnStatisticsObj("col1", "string", colStatData)
          val colStatsObjs = java.util.Arrays.asList(colStatsObj)
          val colStatsDesc = new ColumnStatisticsDesc(true, "test", "foo")
          val colStats = new ColumnStatistics(colStatsDesc, colStatsObjs)
          colStats.setEngine("hive")
      
          client.updateTableColumnStatistics(colStats)
          println("SUCCESS")
      }
      
      val t1 = new Thread(() => go())
      val t2 = new Thread(() => go())
      t1.start()
      t2.start()
      
      t1.join()
      t2.join()
      
      go()
      
      EOF
      
      hive -e "select count(*) from sys.tab_col_stats where db_name = 'test' and table_name = 'foo'" 2> /dev/null
      

      This produces (minus logging):

      [root@igansperger-hive-tgt-3 ~]# sh test.sh
      +------+
      | _c0  |
      +------+
      | 0    |
      +------+
      
      Welcome to Scala 2.13.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_232).
      Type in expressions for evaluation. Or try :help.
      
      SUCCESS
      SUCCESS
      org.apache.hadoop.hive.metastore.api.MetaException: Unexpected 2 statistics for 1 columns
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$update_table_column_statistics_req_result$update_table_column_statistics_req_resultStandardScheme.read(ThriftHiveMetastore.java)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$update_table_column_statistics_req_result$update_table_column_statistics_req_resultStandardScheme.read(ThriftHiveMetastore.java)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$update_table_column_statistics_req_result.read(ThriftHiveMetastore.java)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_update_table_column_statistics_req(ThriftHiveMetastore.java:4597)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.update_table_column_statistics_req(ThriftHiveMetastore.java:4584)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.updateTableColumnStatistics(HiveMetaStoreClient.java:2846)
        at go(<console>:13)
        ... 32 elided
      
      scala>
      scala> :quit
      
      +------+
      | _c0  |
      +------+
      | 2    |
      +------+
      

      Attachments

        Issue Links

          Activity

            People

              dkuzmenko Denys Kuzmenko
              dkuzmenko Denys Kuzmenko
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m