Affects Version/s: 1.2.1, 2.0.0, 3.1.2
Fix Version/s: None
Any Hive JDBC client that uses other SQL clients besides Hive, or any other kind of JDBC driver (e.g. connection pooling). This can only happen if the other driver writes values to DriverManager.setLoginTimeout(). HikariCP is one suspect, there are probably others as well.
There are a few somewhat sketchy things happening in Hive/Thrift code in the JDBC client that result in intermittent "read timed out" (and subsequently "out of sequence") errors when other JDBC drivers are active in the same client JVM that set DriverManager.loginTimeout.
- The login timeout used to initialize a HiveConnection is populated from DriverManager.loginTimeout in the core Java JDBC library. This sounds like a nice, orthodox place to get a login timeout from, but it's fundamentally problematic and really shouldn't be used. The reason is that it's a global singleton value, and any JDBC Driver (or any other piece of code for that matter) can write to it at will (and is implicitly invited to). The Hive JDBC stack itself writes values to this global setting in a couple of places seemingly unrelated to the client connection setup.
- The read timeout for Thrift socket-level reads is actually populated from this login timeout (a.k.a. "connect timeout") setting. (See Thrift's TSocket(String host, int port, int timeout) and its callers in HiveAuthFactory. Also note the numerous code comments that speak of setting SO_TIMEOUT (the socket read timeout) while the actual code references a variable called loginTimeout.) Socket reads can occur thousands of times in an application that does lots of Hive queries, and their individual workloads are each individually less predictable than simply getting a connection, which typically happens at most a few times. So you have a huge probability that a login timeout setting, which seems to usually receive a reasonable value of 30 seconds if constrained at all, will occasionally (way too often) be inadequate for a socket read.
- There seems to be no option to set this login timeout (or the actual read timeout) explicitly as an externalized override setting (but see HIVE-12371).
Summary: DriverManager.loginTimeout can be innocently set by any JDBC driver present in the JVM, you can't override it, and it's misused by Hive as a socket read timeout. There's no way to prevent intermittent read timeouts in this scenario unless you're lucky enough to find the JDBC driver and reconfigure its timeout setting to something workable for Hive socket reads.
An easy, crude patch:
modify the first line of HiveConnection.setupLoginTimeout() from:
long timeOut = TimeUnit.SECONDS.toMillis(DriverManager.getLoginTimeout());
long timeOut = TimeUnit.SECONDS.toMillis(0);
This is of course not a robust fix, as server issues during socket reads can result in a hung client thread. Some other hardcoded value might be more advisable, as long as it's long enough to prevent spurious read timeouts.
The right approach is to prioritize HIVE-12371 (proposed socket timeout override setting that doesn't depend on DriverManager.loginTimeout) and implement it in all possible versions.