[SPARK-32432] Add support for reading ORC/Parquet files with SymlinkTextInputFormat - ASF JIRA

XML

Word

Printable

JSON

Hive style symlink (SymlinkTextInputFormat) is commonly used in different analytic engines including prestodb and prestosql.

Currently SymlinkTextInputFormat works with JSON/CSV files but does not work with ORC/Parquet files in Apache Spark (and Apache Hive).

On the other hand, prestodb and prestosql support SymlinkTextInputFormat with ORC/Parquet files.

This issue is to add support for reading ORC/Parquet files with SymlinkTextInputFormat in Apache Spark.

links to

[Github] Pull Request #29330 (moomindani)