Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32432

Add support for reading ORC/Parquet files with SymlinkTextInputFormat

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • SQL
    • None

    Description

      Hive style symlink (SymlinkTextInputFormat) is commonly used in different analytic engines including prestodb and prestosql.

      Currently SymlinkTextInputFormat works with JSON/CSV files but does not work with ORC/Parquet files in Apache Spark (and Apache Hive).

      On the other hand, prestodb and prestosql support SymlinkTextInputFormat with ORC/Parquet files.

      This issue is to add support for reading ORC/Parquet files with SymlinkTextInputFormat in Apache Spark.

       

      Related links

      Attachments

        Activity

          People

            Unassigned Unassigned
            moomindani Noritaka Sekiyama
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: