Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38009

In start-thriftserver.sh arguments, "--hiveconf xxx" should have higher precedence over "--conf spark.hadoop.xxx", or any other hadoop configurations

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.4.8, 3.2.0
    • None
    • SQL
    • The above experiment is conducted on Apache Spark 2.4.7 & 3.2.0 respectively.

       

      OS: Ubuntu 20.04

      Java: OpenJDK1.8.0

       

    Description

      By convention, An Apache Hive server will read configuration options from different sources with different precedence, and the precedence of "–hiveconf" options in command line options should only be lower than those set by using the set command (see https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration for detail). It should be higher than hadoop configuration, or any of the configuration files on the server (including, but not limited to hive-site.xml and core-site.xml)

      This convention is clearly not maintained very well by Apache Spark thrift server. As demonstrated in the following example: If I start this server with diverging option values on "hive.server2.thrift.port":

       

      ```
      ./sbin/start-thriftserver.sh \
      --conf spark.hadoop.hive.server2.thrift.port=10001 \
      --hiveconf hive.server2.thrift.port=10002
      ```

       

      "–conf"/port 10001 will be preferred over "–hiveconf"/port 10002:

       

      ```

      Spark Command: /usr/lib/jvm/java-8-openjdk-amd64/bin/java -cp /home/xxx/spark-2.4.7-bin-hadoop2.7-scala2.12/conf/:/home/xxx/spark-2.4.7-bin-hadoop2.7-scala2.12/jars/* -Xmx1g org.apache.spark.deploy.SparkSubmit --conf spark.hadoop.hive.server2.thrift.port=10001 --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server spark-internal --hiveconf hive.server2.thrift.port=10002
      ========================================
      ...
      22/01/24 17:32:18 INFO ThriftCLIService: Starting ThriftBinaryCLIService on port 10001 with 5...500 worker threads

      ```

       

      replacing "--conf" line with an entry in core-site.xml makes no difference.

      I doubt if this divergence from conventional hive server behaviour is deliberate. Thus I'm calling the precedence of hive configuration options to be set to be on par or maximally similar to that of an Apache Hive server of the same version. To my knowledge, it should be:

       

      SET command > --hiveconf > hive-site.xml > hive-default.xml > --conf > core-site.xml >. core-default.xml

      Attachments

        Activity

          People

            Unassigned Unassigned
            peng Peng Cheng
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: