[SPARK-21158] SparkSQL function SparkSession.Catalog.ListTables() does not handle spark setting for case-sensitivity - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 2.0.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed
- easyfix
- features
- sparksql
- windows
Environment:

Windows 10
IntelliJ
Scala

Description

When working with SQL table names in Spark SQL we have noticed some issues with case-sensitivity.

If you set spark.sql.caseSensitive setting to be true, SparkSQL stores the table names in the way it was provided. This is correct.

If you set spark.sql.caseSensitive setting to be false, SparkSQL stores the table names in lower case.

Then, we use the function sqlContext.tableNames() to get all the tables in our DB. We check if this list contains(<"string of table name">) to determine if we have already created a table. If case-sensitivity is turned off (false), this function should look if the table name is contained in the table list regardless of case.

However, it tries to look for only ones that match the lower case version of the stored table. Therefore, if you pass in a camel or upper case table name, this function would return false when in fact the table does exist.

The root cause of this issue is in the function SparkSession.Catalog.ListTables()

For example:
In your SQL context - you have four tables and you have chosen to have spark.sql.case-Sensitive=false so it stores your tables in lowercase:
carnames
carmodels
carnamesandmodels
users
dealerlocations

When running your pipeline, you want to see if you have already created the temp join table of 'carnamesandmodels'. However, you have stored it as a constant which reads: CarNamesAndModels for readability.

So you can use the function
sqlContext.tableNames().contains("CarNamesAndModels").
This should return true - because we know its already created, but it will currently return false since CarNamesAndModels is not in lowercase.

The responsibility to change the name passed into the .contains method to be lowercase should not be put on the spark user. This should be done by spark sql if case-sensitivity is turned to false.

Proposed solutions:

Setting case sensitive in the sql context should make the sql context be agnostic to case but not change the storage of the table
There should be a custom contains method for ListTables() which converts the tablename to be lowercase before checking
SparkSession.Catalog.ListTables() should return the list of tables in the input format instead of in all lowercase.

Attachments

Issue Links

links to

[Github] Pull Request #18423 (cammachusa)

GitHub Pull Request #18423

Activity

People

Assignee:: Unassigned

Reporter:: Kathryn McClintic

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 21/Jun/17 00:02

Updated:: 19/Apr/20 00:10

Resolved:: 21/May/19 04:17

Time Tracking

Estimated:

24h

Remaining:

24h

Logged:

Not Specified