Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
Spark SQL can take Parquet files or JSON files as a table directly (without given a case class to define the schema)
as a component named SQL, it should also be able to take a ResultSet from RDBMS easily.
i find that there is a JdbcRDD in core: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala
so i want to make some small change in this file to allow SQLContext to read the MetaData from the PreparedStatement (read metadata do not need to execute the query really).
Then, in Spark SQL, SQLContext can create SchemaRDD with JdbcRDD and his MetaData.
In the further, maybe we can add a feature in sql-shell, so that user can using spark-thrift-server join tables from different sources
such as:
CREATE TABLE jdbc_tbl1 AS JDBC "connectionString" "username" "password" "initQuery" "bound" ... CREATE TABLE parquet_files AS PARQUET "hdfs://tmp/parquet_table/" SELECT parquet_files.colX, jdbc_tbl1.colY FROM parquet_files JOIN jdbc_tbl1 ON (parquet_files.id = jdbc_tbl1.id)
I think such a feature will be useful, like facebook Presto engine does.
oh, and there is a small bug in JdbcRDD
in compute(), method close()
if (null != conn && ! stmt.isClosed()) conn.close()
should be
if (null != conn && ! conn.isClosed()) conn.close()
just a small write error
but such a close method will never be able to close conn...