Details
Description
The following code, running in an IPython shell throws an error:
In [1]: from pyspark import SparkContext, HiveContext In [2]: sc = SparkContext('local[*]', 'test') Spark assembly has been built with Hive, including Datanucleus jars on classpath In [3]: sql = HiveContext(sc) In [4]: import pandas as pd In [5]: df = pd.DataFrame({'a': [1.0, 2.0, 3.0], 'b': [1, 2, 3], 'c': list('abc')}) In [6]: df2 = pd.DataFrame({'a': [2.0, 3.0, 4.0], 'b': [4, 5, 6], 'c': list('def')}) In [7]: sdf = sql.createDataFrame(df) In [8]: sdf2 = sql.createDataFrame(df2) In [9]: sql.registerDataFrameAsTable(sdf, 'sdf') In [10]: sql.registerDataFrameAsTable(sdf2, 'sdf2') In [11]: sql.cacheTable('sdf') In [12]: sql.cacheTable('sdf2') In [13]: sdf2.insertInto('sdf') # throws an error
Here's the Java traceback:
Py4JJavaError: An error occurred while calling o270.insertInto. : java.lang.AssertionError: assertion failed: No plan for InsertIntoTable (LogicalRDD [a#0,b#1L,c#2], MapPartitionsRDD[13] at mapPartitions at SQLContext.scala:1167), Map(), false InMemoryRelation [a#6,b#7L,c#8], true, 10000, StorageLevel(true, true, false, true, 1), (PhysicalRDD [a#6,b#7L,c#8], MapPartitionsRDD[41] at mapPartitions at SQLContext.scala:1167), Some(sdf2) at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:1085) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:1083) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:1089) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:1089) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1092) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1092) at org.apache.spark.sql.DataFrame.insertInto(DataFrame.scala:1134) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745)
I'd be ecstatic if this was my own fault, and I'm somehow using it incorrectly.
Attachments
Issue Links
- is related to
-
SPARK-6941 Provide a better error message to explain that tables created from RDDs are immutable
- Resolved