Details
-
Improvement
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
The Hive ORC reader supports recursive directory reads from S3. Spark's native ORC reader supports recursive directory reads, but not when used with Hive.
val testData = List(1,2,3,4,5) val dataFrame = testData.toDF() dataFrame .coalesce(1) .write .mode(SaveMode.Overwrite) .format("orc") .option("compression", "zlib") .save("s3://ddrinka.sparkbug/dirTest/dir1/dir2/") spark.sql("DROP TABLE IF EXISTS ddrinka_sparkbug.dirTest") spark.sql("CREATE EXTERNAL TABLE ddrinka_sparkbug.dirTest (val INT) STORED AS ORC LOCATION 's3://ddrinka.sparkbug/dirTest/'") spark.conf.set("hive.mapred.supports.subdirectories","true") spark.conf.set("mapred.input.dir.recursive","true") spark.conf.set("mapreduce.input.fileinputformat.input.dir.recursive","true") spark.conf.set("spark.sql.hive.convertMetastoreOrc", "true") println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count) //0 spark.conf.set("spark.sql.hive.convertMetastoreOrc", "false") println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count) //5
Attachments
Issue Links
- is related to
-
SPARK-40600 Support recursiveFileLookup for partitioned datasource
-
- In Progress
-
- relates to
-
SPARK-28099 Assertion when querying unpartitioned Hive table with partition-like naming
-
- Open
-
- links to