SPARK-2883, Apache Spark supports Apache ORC inside `sql/hive` module with Hive dependency. This issue aims to add a new and faster ORC data source inside `sql/core` and to replace the old ORC data source eventually. In this issue, the latest Apache ORC 1.4.0 (released yesterday) is used.
There are four key benefits.
- Speed: Use both Spark `ColumnarBatch` and ORC `RowBatch` together. This is faster than the current implementation in Spark.
- Stability: Apache ORC 1.4.0 has many fixes and we can depend on ORC community more.
- Usability: User can use `ORC` data sources without hive module, i.e, `-Phive`.
- Maintainability: Reduce the Hive dependency and can remove old legacy code later.