Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
3.1.2
-
None
-
None
Description
When testing Spark performance with TPCDS,run some sql,such as:q1,I found this error
java.lang.UnsupportedOperationException: DataType: char(2)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73)
at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189)
at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala
I used the sql to create table,such as
create table customer
stored as orc
as select * from tpdc_text.customer
CLUSTER BY c_customer_sk
create table store
stored as orc
as select * from tpdc_text.store
CLUSTER BY s_store_sk
create table date_dim
stored as orc
as select * from tpdc_text.date_dim;
create table store_returns
(
sr_return_time_sk bigint
, sr_item_sk bigint
, sr_customer_sk bigint
, sr_cdemo_sk bigint
, sr_hdemo_sk bigint
, sr_addr_sk bigint
, sr_store_sk bigint
, sr_reason_sk bigint
, sr_ticket_number bigint
, sr_return_quantity int
, sr_return_amt decimal(7,2)
, sr_return_tax decimal(7,2)
, sr_return_amt_inc_tax decimal(7,2)
, sr_fee decimal(7,2)
, sr_return_ship_cost decimal(7,2)
, sr_refunded_cash decimal(7,2)
, sr_reversed_charge decimal(7,2)
, sr_store_credit decimal(7,2)
, sr_net_loss decimal(7,2)
)
partitioned by (sr_returned_date_sk bigint)
stored as orc;
when I modify this code in the classOrcFilters,I can run succeed
/**
- Get PredicateLeafType which is corresponding to the given DataType.
*/
def getPredicateLeafType(dataType: DataType): PredicateLeaf.Type = dataType match {
case BooleanType => PredicateLeaf.Type.BOOLEAN
case ByteType | ShortType | IntegerType | LongType => PredicateLeaf.Type.LONG
case FloatType | DoubleType => PredicateLeaf.Type.FLOAT
case StringType => PredicateLeaf.Type.STRING
case CharType(length) => PredicateLeaf.Type.STRING
case VarcharType(length) => PredicateLeaf.Type.STRING
case DateType => PredicateLeaf.Type.DATE
case TimestampType => PredicateLeaf.Type.TIMESTAMP
case _: DecimalType => PredicateLeaf.Type.DECIMAL
case _ => throw new UnsupportedOperationException(s"DataType: ${dataType.catalogString}")
}
Attachments
Issue Links
- Blocked
-
SPARK-35700 spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar data type
-
- Resolved
-