Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
2.3.0
-
None
-
None
Description
spark.sql("create table x1(name string,age int) stored as parquet ")
spark.sql("insert into x1 select 'a',29")
spark.sql("create table x2 (name string,age int) stored as parquet '")
spark.sql("insert into x2_ex select 'a',29")
scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain
== Physical Plan ==
*(2) BroadcastHashJoin name#101, name#103, Inner, BuildRight
:- *(2) Project name#101, age#102
: +- *(2) Filter isnotnull(name#101)
: +- *(2) FileScan parquet default.x1_exname#101,age#102 Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x1, PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: struct<name:string,age:int>
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true]))
+- *(1) Project name#103, age#104
+- *(1) Filter isnotnull(name#103)
+- *(1) FileScan parquet default.x2_exname#103,age#104 Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x2, PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: struct<name:string,age:int>
Now Restart Spark-Shell or do spark-submit orrestart JDBCServer again and run same select query again
scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain
scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain
== Physical Plan ==
*(5) SortMergeJoin [name#43], name#45, Inner
:- *(2) Sort name#43 ASC NULLS FIRST, false, 0
: +- Exchange hashpartitioning(name#43, 200)
: +- *(1) Project name#43, age#44
: +- *(1) Filter isnotnull(name#43)
: +- *(1) FileScan parquet default.x1name#43,age#44 Batched: true, Format: Parquet, Location: InMemoryFileIndexfile:/D:/spark_release/spark/bin/spark-warehouse/x1, PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: struct<name:string,age:int>
+- *(4) Sort name#45 ASC NULLS FIRST, false, 0
+- Exchange hashpartitioning(name#45, 200)
+- *(3) Project name#45, age#46
+- *(3) Filter isnotnull(name#45)
+- *(3) FileScan parquet default.x2name#45,age#46 Batched: true, Format: Parquet, Location: InMemoryFileIndexfile:/D:/spark_release/spark/bin/spark-warehouse/x2, PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: struct<name:string,age:int>
scala> spark.sql("desc formatted x1").show(200,false)
---------------------------------------------------------------------------------------------
col_name | data_type | comment |
---------------------------------------------------------------------------------------------
name | string | null |
age | int | null |
|
||
Database | default | |
Table | x1 | |
Owner | Administrator | |
Created Time | Sun Aug 19 12:36:58 IST 2018 | |
Last Access | Thu Jan 01 05:30:00 IST 1970 | |
Created By | Spark 2.3.0 | |
Type | MANAGED | |
Provider | hive | |
Table Properties | [transient_lastDdlTime=1534662418] | |
Location | file:/D:/spark_release/spark/bin/spark-warehouse/x1 | |
Serde Library | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | |
InputFormat | org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | |
OutputFormat | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | |
Storage Properties | [serialization.format=1] | |
Partition Provider | Catalog |
---------------------------------------------------------------------------------------------
With datasource table ,working fine ( create table using parquet instead of stored by )