Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
containerized environment on EC2 (amzn2-ami-hvm-2.0.20191116.0-x86_64-gp2)
-
Important
Description
Scenario:
We use ExecuteSQL to read delta tables (stored in S3) via JDBC connection to databricks.
Temporary Fix:
If we deactivate and reactivate the controller service, then ExecuteSQL works without problems. What is noticeable here, however, is that it takes quite a long time the first time it is executed and the next time it is executed it is done within 3 seconds.
Background information:
- Howto use Databricks JDBC https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html
- Controller Service DBCPConnectionPool 1.11.3
URL: jdbc:spark://#{databricks.host}...{databricks.cluster.id};...;PWD=#{databricks.token}
Driver Class: com.simba.spark.jdbc.Driver
- Table
- one column with <20 entries
- Created By Spark 2.4.4
- Type MANAGED
- Provider delta
- Location s3
- Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
- InputFormat org.apache.hadoop.mapred.SequenceFileInputFormat
- OutputFormat org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
- SQL
- SELECT * FROM "${db.table.schema}"."${db.table.name}"
- output <20 entries