Details
Description
MySQL JDBC driver gives possibility to not store ResultSet in memory.
We can do this by setting fetchSize to Int.MIN_VALUE.
Unfortunately this configuration isn't correct in Spark.
java.lang.IllegalArgumentException: requirement failed: Invalid value `-2147483648` for parameter `fetchsize`. The minimum value is 0. When the value is 0, the JDBC driver ignores the value and does the estimates. at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:105) at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:34) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:206) at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:748)
https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html