Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13983

HiveThriftServer2 can not get "--hiveconf" or ''--hivevar" variables since 1.6 version (both multi-session and single session)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0, 1.6.1
    • 2.3.0
    • SQL
    • None
    • ubuntu, spark 1.6.0 standalone, spark 1.6.1 standalone
      (tried spark branch-1.6 snapshot as well)
      compiled with scala 2.10.5 and hadoop 2.6
      (-Phadoop-2.6 -Psparkr -Phive -Phive-thriftserver)

    Description

      HiveThriftServer2 should be able to get "--hiveconf" or ''--hivevar" variables from JDBC client, either from command line parameter of beeline, such as
      beeline --hiveconf spark.sql.shuffle.partitions=3 --hivevar db_name=default
      or from JDBC connection string, like
      jdbc:hive2://localhost:10000?spark.sql.shuffle.partitions=3#db_name=default

      this worked in spark version 1.5.x, but after upgraded to 1.6, it doesn't work.

      to reproduce this issue, try to connect to HiveThriftServer2 with beeline:

      bin/beeline -u jdbc:hive2://localhost:10000 \
                  --hiveconf spark.sql.shuffle.partitions=3 \
                  --hivevar db_name=default
      

      or

      bin/beeline -u jdbc:hive2://localhost:10000?spark.sql.shuffle.partitions=3#db_name=default
      

      will get following results:

      0: jdbc:hive2://localhost:10000> set spark.sql.shuffle.partitions;
      +-------------------------------+--------+--+
      |              key              | value  |
      +-------------------------------+--------+--+
      | spark.sql.shuffle.partitions  | 200    |
      +-------------------------------+--------+--+
      1 row selected (0.192 seconds)
      0: jdbc:hive2://localhost:10000> use ${db_name};
      Error: org.apache.spark.sql.AnalysisException: cannot recognize input near '$' '{' 'db_name' in switch database statement; line 1 pos 4 (state=,code=0)
      

      -

      but this bug does not affect current versions of spark-sql CLI, following commands works:

      bin/spark-sql --master local[2] \
                    --hiveconf spark.sql.shuffle.partitions=3 \
                    --hivevar db_name=default
      
      spark-sql> set spark.sql.shuffle.partitions
      spark.sql.shuffle.partitions   3
      Time taken: 1.037 seconds, Fetched 1 row(s)
      
      spark-sql> use ${db_name};
      OK
      Time taken: 1.697 seconds
      

      so I think it may caused by this change: https://github.com/apache/spark/pull/8909 ( SPARK-10810 SPARK-10902 [SQL] Improve session management in SQL )

      perhaps by calling hiveContext.newSession, the variables from sessionConf were not loaded into the new session? (https://github.com/apache/spark/pull/8909/files#diff-8f8b7f4172e8a07ff20a4dbbbcc57b1dR69)

      Attachments

        Issue Links

          Activity

            People

              yumwang Yuming Wang
              chutium Teng Qiu
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: