Description
Reproduction steps:
1. download a standard "Hadoop Free" build
2. Start pyspark REPL with Hive support
SPARK_DIST_CLASSPATH=$(~/dist/hadoop-3.4.0-SNAPSHOT/bin/hadoop classpath) ~/dist/spark-3.2.3-bin-without-hadoop/bin/pyspark --conf spark.sql.catalogImplementation=hive
3. Execute any simple dataframe operation
>>> spark.range(100).show() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/session.py", line 416, in range jdf = self._jsparkSession.range(0, int(start), int(step), int(numPartitions)) File "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__ File "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/utils.py", line 117, in deco raise converted from None pyspark.sql.utils.IllegalArgumentException: <exception str() failed>
4. In fact you can just call spark.conf to trigger this issue
>>> spark.conf
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
...
There are probably two issues here:
1) that Hive support should be gracefully disabled if it the dependency not on the classpath as claimed by https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
2) but at the very least the user should be able to see the exception to understand the issue, and take an action