Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21422

Depend on Apache ORC 1.4.0

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • Build
    • None

    Description

      Like Parquet, this issue aims to depend on the latest Apache ORC 1.4 for Apache Spark 2.3. There are key benefits for now.

      • Stability: Apache ORC 1.4.0 has many fixes and we can depend on ORC community more.
      • Maintainability: Reduce the Hive dependency and can remove old legacy code later.

      Later, we can get the following two key benefits by adding new ORCFileFormat in SPARK-20728, too.

      • Usability: User can use ORC data sources without hive module, i.e, -Phive.
      • Speed: Use both Spark ColumnarBatch and ORC RowBatch together. This is faster than the current implementation in Spark.

      Attachments

        Issue Links

          Activity

            People

              dongjoon Dongjoon Hyun
              dongjoon Dongjoon Hyun
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: