Details
Description
Most of our tables load into dataframes just fine with postgres. However we have a number of tables leveraging the JSONB datatype. Spark will error and refuse to load this table. While asking for Spark to support JSONB might be a tall order in the short term, it would be great if Spark would at least load the table ignoring the columns it can't load or have it be an option.
pdf = sql_context.load(source="jdbc", url=url, dbtable="table_of_json") Py4JJavaError: An error occurred while calling o41.load. : java.sql.SQLException: Unsupported type 1111 at org.apache.spark.sql.jdbc.JDBCRDD$.getCatalystType(JDBCRDD.scala:78) at org.apache.spark.sql.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:112) at org.apache.spark.sql.jdbc.JDBCRelation.<init>(JDBCRelation.scala:133) at org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:121) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:685) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745)
Attachments
Issue Links
- relates to
-
SPARK-10186 Add support for more postgres column types
- Resolved
- links to