Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7292 Hive on Spark
  3. HIVE-8509

UT: fix list_bucket_dml_2 test [Spark Branch]

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: Spark
    • Labels:
      None

      Description

      The test list_bucket_dml_2 fails in FileSinkOperator.publishStats:

      org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30002]: StatsPublisher cannot be connected to.There was a error while connecting to the StatsPublisher, and retrying might help. If you dont want the query to fail because accurate statistics could not be collected, set hive.stats.reliable=false
      at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1079)
      at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:971)
      at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:582)
      at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
      at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
      at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
      at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:175)
      at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
      at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:121)

      I debugged and found that FileSinkOperator.publishStats throws the exception when calling statsPublisher.connect here:
      if (!statsPublisher.connect(hconf))

      { // just return, stats gathering should not block the main query LOG.error("StatsPublishing error: cannot connect to database"); if (isStatsReliable) { throw new HiveException(ErrorMsg.STATSPUBLISHER_CONNECTION_ERROR.getErrorCodedMsg()); }

      return;
      }

      With the hive.stats.dbclass set to counter in data/conf/spark/hive-site.xml, the statsPuvlisher is of type CounterStatsPublisher.
      In CounterStatsPublisher, the exception is thrown because getReporter() returns null for the MapredContext:
      MapredContext context = MapredContext.get();
      if (context == null || context.getReporter() == null)

      { return false; }

      When changing hive.stats.dbclass to jdbc:derby in data/conf/spark/hive-site.xml, similar to TestCliDriver it works:
      <property>
      <name>hive.stats.dbclass</name>
      <!-- <value>counter</value> -->
      <value>jdbc:derby</value>
      <description>The default storatge that stores temporary hive statistics. Currently, jdbc, hbase and counter type is supported</description>
      </property>

      In addition, I had to generate the out file for the test case for spark.

      When running this test with TestCliDriver and hive.stats.dbclass set to counter, the test case still works. The reporter is set to org.apache.hadoop.mapred.Task$TaskReporter.

      Might need some additional investigation why the CounterStatsPublisher has no reporter in case of spark.

        Attachments

        1. HIVE-8509-spark.patch
          30 kB
          Xuefu Zhang
        2. HIVE-8509-spark.patch
          30 kB
          Chinna Rao Lalam

          Activity

            People

            • Assignee:
              chinnalalam Chinna Rao Lalam
              Reporter:
              tfriedr Thomas Friedrich
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: