Details
-
Improvement
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
Hive style symlink (SymlinkTextInputFormat) is commonly used in different analytic engines including prestodb and prestosql.
Currently SymlinkTextInputFormat works with JSON/CSV files but does not work with ORC/Parquet files in Apache Spark (and Apache Hive).
On the other hand, prestodb and prestosql support SymlinkTextInputFormat with ORC/Parquet files.
This issue is to add support for reading ORC/Parquet files with SymlinkTextInputFormat in Apache Spark.
Related links
- Hive's SymlinkTextInputFormat: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
- prestosql's implementation to add support for reading avro files with SymlinkTextInputFormat: https://github.com/vincentpoon/prestosql/blob/master/presto-hive/src/main/java/io/prestosql/plugin/hive/BackgroundHiveSplitLoader.java