Description
Hello ~
I found a problem, but there are two ways to solve it.
The parquet filter is pushed down. When using the like '***%' statement to query, if the system default encoding is not UTF-8, it may cause an error.
There are two ways to bypass this problem as far as I know
1. spark.executor.extraJavaOptions="-Dfile.encoding=UTF-8"
2. spark.sql.parquet.filterPushdown.string.startsWith=false
The following is the information to reproduce this problem
The parquet sample file is in the attachment
spark.read.parquet("file:///home/kylin/hjldir/part-00000-30432312-7cdb-43ef-befe-93bcfd174878-c000.snappy.parquet").createTempView("tmp”) spark.sql("select * from tmp where `1` like '啦啦乐乐%'").show(false)
I think the correct code should be:
private val strToBinary = Binary.fromReusedByteArray(v.getBytes(StandardCharsets.UTF_8))