Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21422

Depend on Apache ORC 1.4.0

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • Build
    • None

    Description

      Like Parquet, this issue aims to depend on the latest Apache ORC 1.4 for Apache Spark 2.3. There are key benefits for now.

      • Stability: Apache ORC 1.4.0 has many fixes and we can depend on ORC community more.
      • Maintainability: Reduce the Hive dependency and can remove old legacy code later.

      Later, we can get the following two key benefits by adding new ORCFileFormat in SPARK-20728, too.

      • Usability: User can use ORC data sources without hive module, i.e, -Phive.
      • Speed: Use both Spark ColumnarBatch and ORC RowBatch together. This is faster than the current implementation in Spark.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dongjoon Dongjoon Hyun
            dongjoon Dongjoon Hyun
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment