Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.0
    • Component/s: Build
    • Labels:
      None

      Description

      Like Parquet, this issue aims to depend on the latest Apache ORC 1.4 for Apache Spark 2.3. There are key benefits for now.

      • Stability: Apache ORC 1.4.0 has many fixes and we can depend on ORC community more.
      • Maintainability: Reduce the Hive dependency and can remove old legacy code later.

      Later, we can get the following two key benefits by adding new ORCFileFormat in SPARK-20728, too.

      • Usability: User can use ORC data sources without hive module, i.e, -Phive.
      • Speed: Use both Spark ColumnarBatch and ORC RowBatch together. This is faster than the current implementation in Spark.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dongjoon Dongjoon Hyun
                Reporter:
                dongjoon Dongjoon Hyun
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: