[SPARK-20901] Feature parity for ORC with Parquet - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.1.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

This issue aims to track the feature parity for ORC with Parquet.

Attachments

Issue Links

is blocked by

SPARK-15705 Spark won't read ORC schema from metastore for partitioned tables

Resolved

SPARK-22258 Writing empty dataset fails with ORC format

Resolved

SPARK-25306 Avoid skewed filter trees to speed up `createFilter` in ORC

Resolved

SPARK-14387 Enable Hive-1.x ORC compatibility with spark.sql.hive.convertMetastoreOrc

Resolved

SPARK-15347 Problem select empty ORC table

Resolved

SPARK-15474 ORC data source fails to write and read back empty dataframe

Resolved

SPARK-15731 orc writer directory permissions

Resolved

SPARK-15757 Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file

Resolved

SPARK-16628 OrcConversions should not convert an ORC table represented by MetastoreRelation to HadoopFsRelation if metastore schema does not match schema stored in ORC files

Resolved

SPARK-17047 Spark 2 cannot create table when CLUSTERED.

Resolved

SPARK-18355 Spark SQL fails to read data from a ORC hive table that has a new column added to it

Resolved

SPARK-19109 ORC metadata section can sometimes exceed protobuf message size limit

Resolved

SPARK-19430 Cannot read external tables with VARCHAR columns if they're backed by ORC files written by Hive 1.2.1

Resolved

SPARK-19809 NullPointerException on zero-size ORC file

Resolved

SPARK-20515 Issue with reading Hive ORC tables having char/varchar columns in Spark SQL

Resolved

SPARK-21422 Depend on Apache ORC 1.4.0

Resolved

SPARK-21686 spark.sql.hive.convertMetastoreOrc is causing NullPointerException while reading ORC tables

Resolved

SPARK-21762 FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible

Resolved

SPARK-21912 ORC/Parquet table should not create invalid column names

Resolved

SPARK-21929 Support `ALTER TABLE table_name ADD COLUMNS(..)` for ORC data source

Resolved

SPARK-22158 convertMetastore should not ignore storage properties

Resolved

SPARK-22267 Spark SQL incorrectly reads ORC file when column order is different

Resolved

SPARK-22279 Turn on spark.sql.hive.convertMetastoreOrc by default

Resolved

SPARK-22300 Update ORC to 1.4.1

Resolved

SPARK-22712 Use `buildReaderWithPartitionValues` in native OrcFileFormat

Resolved

SPARK-23007 Add schema evolution test suite for file-based data sources

Resolved

SPARK-23049 `spark.sql.files.ignoreCorruptFiles` should work for ORC files

Resolved

SPARK-23340 Upgrade Apache ORC to 1.4.3

Resolved

SPARK-23355 convertMetastore should not ignore table properties

Resolved

SPARK-23399 Register a task completion listener first for OrcColumnarBatchReader

Resolved

SPARK-24322 Upgrade Apache ORC to 1.4.4

Resolved

SPARK-24472 Orc RecordReaderFactory throws IndexOutOfBoundsException

Resolved

SPARK-25175 Field resolution should fail if there's ambiguity for ORC native reader

Resolved

SPARK-25427 Add BloomFilter creation test cases

Resolved

SPARK-25438 Fix FilterPushdownBenchmark to use the same memory assumption

Resolved

SPARK-26427 Upgrade Apache ORC to 1.5.4

Resolved

SPARK-26437 Decimal data becomes bigint to query, unable to query

Resolved

SPARK-14286 Empty ORC table join throws exception

Resolved

SPARK-22280 Improve StatisticsSuite to test `convertMetastore` properly

Resolved

SPARK-22320 ORC should support VectorUDT/MatrixUDT

Resolved

SPARK-25145 Buffer size too small on spark.sql query with filterPushdown predicate=True

Resolved

SPARK-21791 ORC should support column names with dot

Closed

SPARK-11412 Support merge schema for ORC

Resolved

SPARK-16060 Vectorized ORC reader

Resolved

SPARK-22781 Support creating streaming dataset with ORC files

Resolved

SPARK-18540 Wholestage code-gen for ORC Hive tables

Resolved

SPARK-20682 Add new ORCFileFormat based on Apache ORC

Resolved

SPARK-20728 Make ORCFileFormat configurable between sql/hive and sql/core

Resolved

SPARK-21787 Support for pushing down filters for DateType in native OrcFileFormat

Resolved

SPARK-21839 Support SQL config for ORC compression

Resolved

SPARK-23456 Turn on `native` ORC implementation by default

Resolved

SPARK-24576 Upgrade Apache ORC to 1.5.2

Resolved

SPARK-34562 Leverage parquet bloom filters

Resolved

SPARK-12417 Orc bloom filter options are not propagated during file write in spark

Resolved

SPARK-21783 Turn on ORC filter push-down by default

Resolved

SPARK-23276 Enable UDT tests in (Hive)OrcHadoopFsRelationSuite

Resolved

SPARK-23305 Test `spark.sql.files.ignoreMissingFiles` for all file-based data sources

Resolved

SPARK-23452 Extend test coverage to all ORC readers

Resolved

SPARK-24112 Add `spark.sql.hive.convertMetastoreTableProperty` for backward compatiblility

Closed

SPARK-23072 Add a Unicode schema test for file-based data sources

Resolved

SPARK-23342 Add ORC configuration tests for ORC data source

Resolved

SPARK-23426 Use `hive` ORC impl and disable PPD for Spark 2.3.0

Resolved

SPARK-22672 Refactor ORC Tests

Resolved

SPARK-22416 Move OrcOptions from `sql/hive` to `sql/core`

Resolved

SPARK-23313 Add a migration guide for ORC

Resolved

is related to

SPARK-19459 ORC tables cannot be read when they contain char/varchar columns

Resolved

ORC-233 Allow `orc.include.columns` to be empty

Closed

relates to

HIVE-14007 Replace ORC module with ORC release

Resolved

(60 is blocked by, 2 is related to, 1 relates to)

Activity

People

Assignee:: Unassigned

Reporter:: Dongjoon Hyun

Votes:: 0 Vote for this issue

Watchers:: 24 Start watching this issue

Dates

Created:: 26/May/17 17:53

Updated:: 27/Feb/21 18:02