Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-28424

The Docker Image of HiveServer2 should provide an env variable defining the `hostname:port` passed into the znode

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      • The Docker Image of HiveServer2 should provide an environment variable defining the hostname:port passed into the znode.
      • This requirement may seem a bit strange at first glance, but it requires the introduction of a small service orchestration scenario related to https://github.com/dbeaver/dbeaver/issues/22777 .
      • For the Docker Image of HiveServer2 on apache/hive:4.0.0, if I need to enable Zookeeper Service Discovery, I apparently need to overwrite the hive-site.xml in the Docker Image of apache/hive:4.0.0. I tested what needs to be done to achieve this at https://github.com/linghengqian/hivesever2-v400-sd-test . First I need to define a docker-compose file to pull in the zookeeper server.
      services:
        zookeeper-server:
          image: zookeeper:3.9.2-jre-17
          restart: always
          ports:
            - "2181:2181"
        hive-server2:
          image: apache/hive:4.0.0
          restart: always
          hostname: '127.0.0.1'
          depends_on:
            zookeeper-server:
              condition: service_started
          environment:
            SERVICE_NAME: hiveserver2
            HIVE_CUSTOM_CONF_DIR: /hive_custom_conf
          ports:
            - "10000:10000"
            - "10002:10002"
          volumes:
            - ./hive-custom-conf:/hive_custom_conf 
      
      • Setting the hostname of the hive-server2 docker container to 127.0.0.1 already compromises the local docker network. This is because (HiveServer2 hostname + :10000) is always passed to the znode in the zookeeper server, which cannot be changed externally. Generally, the znode node at /hiveserver2/serverUri=localhost:10000;version=4.0.0;sequence=0000000000 has the content hive.server2.instance.uri=localhost:10000;hive.server2.authentication=NONE;hive.server2.transport.mode=binary;hive.server2.thrift.sasl.qop=auth;hive.server2.thrift.bind.host=localhost;hive.server2.thrift.port=10000;hive.server2.use.SSL=false . This can be observed from the zookeeper ui on the web by deploying a Docker container called elkozmon/zoonavigator:1.1.3 .
      • At this point, I also need to mount a hive-site.xml into the Docker Image of HiveServer2. Most of the content here is repeated with https://github.com/apache/hive/blob/rel/release-4.0.0/packaging/src/docker/conf/hive-site.xml, but since hive-site.xml does not seem to exist in multiple copies, I can only repeat the definition.
      <?xml version="1.0" encoding="UTF-8"?>
      <configuration>
          <property>
              <name>hive.server2.enable.doAs</name>
              <value>false</value>
          </property>
          <property>
              <name>hive.tez.exec.inplace.progress</name>
              <value>false</value>
          </property>
          <property>
              <name>hive.tez.exec.print.summary</name>
              <value>true</value>
          </property>
          <property>
              <name>hive.exec.scratchdir</name>
              <value>/opt/hive/scratch_dir</value>
          </property>
          <property>
              <name>hive.user.install.directory</name>
              <value>/opt/hive/install_dir</value>
          </property>
          <property>
              <name>tez.runtime.optimize.local.fetch</name>
              <value>true</value>
          </property>
          <property>
              <name>hive.exec.submit.local.task.via.child</name>
              <value>false</value>
          </property>
          <property>
              <name>mapreduce.framework.name</name>
              <value>local</value>
          </property>
          <property>
              <name>tez.local.mode</name>
              <value>true</value>
          </property>
          <property>
              <name>hive.metastore.warehouse.dir</name>
              <value>/opt/hive/data/warehouse</value>
          </property>
          <property>
              <name>metastore.metastore.event.db.notification.api.auth</name>
              <value>false</value>
          </property>
      
          <property>
              <name>hive.server2.support.dynamic.service.discovery</name>
              <value>true</value>
          </property>
          <property>
              <name>hive.zookeeper.quorum</name>
              <value>zookeeper-server:2181</value>
          </property>
      </configuration>
      
      • At this point, outside of Docker Compose's Network, I can connect to the deployed HiveServer2 in dbeaver via the jdbcUrl of jdbc:hive2://localhost:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;.
      • But if the docker compose file is defined like this. I only changed the hostname of both containers in the same docker network.
      services:
        zookeeper-server:
          image: zookeeper:3.9.2-jre-17
          hostname: 'zookeeper-server'
          restart: always
          ports:
            - "2181:2181"
        hive-server2:
          image: apache/hive:4.0.0
          restart: always
          hostname: 'server2.hive.com'
          depends_on:
            zookeeper-server:
              condition: service_started
          environment:
            SERVICE_NAME: hiveserver2
            HIVE_CUSTOM_CONF_DIR: /hive_custom_conf
          ports:
            - "10000:10000"
            - "10002:10002"
          volumes:
            - ./hive-custom-conf:/hive_custom_conf 
      
      • Apparently, using the jdbcUrl of jdbc:hive2://localhost:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2; still connects to zookeeper, but not to HiveServer2. Because at this time in the zookeeper server, there is only the znode /hiveserver2/serverUri=server2.hive.com:10000;version=4.0.0;sequence=0000000000, and its content is hive.server2.instance.uri=server2.hive.com:10000;hive.server2.authentication=NONE;hive.server2.transport.mode=binary;hive.server2.thrift.sasl.qop=auth;hive.server2.thrift.bind.host=server2.hive.com;hive.server2.thrift.port=10000;hive.server2.use.SSL=false. And server2.hive.com:10000 is not accessible outside the docker network, which actually affects the local debugging experience.
      com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: Could not open client transport for any of the Server URI's in ZooKeeper: Socket is closed by peer.
      
      	at com.zaxxer.hikari.pool.HikariPool.throwPoolInitializationException(HikariPool.java:596)
      	at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:582)
      	at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:115)
      	at com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:81)
      	at com.lingh.HiveTest.test(HiveTest.java:20)
      	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
      	at java.base/java.util.ArrayList.forEach(ArrayList.java:1597)
      	at java.base/java.util.ArrayList.forEach(ArrayList.java:1597)
      Caused by: java.sql.SQLException: Could not open client transport for any of the Server URI's in ZooKeeper: Socket is closed by peer.
      	at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:420)
      	at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:285)
      	at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:94)
      	at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:121)
      	at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:364)
      	at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:206)
      	at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:476)
      	at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:561)
      	... 6 more
      Caused by: org.apache.hive.org.apache.thrift.transport.TTransportException: Socket is closed by peer.
      	at org.apache.hive.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:184)
      	at org.apache.hive.org.apache.thrift.transport.TTransport.readAll(TTransport.java:109)
      	at org.apache.hive.org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:151)
      	at org.apache.hive.org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:272)
      	at org.apache.hive.org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:39)
      	at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:512)
      	at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:382)
      	... 13 more
      
      • I don't seem to see any way in the documentation to change the hiveserver2 hostname and port passed into the zookeeper node for HiveServer2 in the Docker Image. It would be nice if there was an easier way to change the hiveserver2 hostname and port passed into the zookeeper node, such as giving the docker image an environment variable.
      • I have set up a small unit test at https://github.com/linghengqian/hivesever2-v400-sd-test for testing, and the instructions for running are in the README.

      Attachments

        Activity

          People

            Unassigned Unassigned
            linghengqian Qiheng He
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: