[SPARK-3720] support ORC in spark sql - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.1.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Target Version/s:

1.2.0

Description

The Optimized Row Columnar (ORC) file format provides a highly efficient way to store data on hdfs.ORC file format has many advantages such as:

1 a single file as the output of each task, which reduces the NameNode's load
2 Hive type support including datetime, decimal, and the complex types (struct, list, map, and union)
3 light-weight indexes stored within the file
skip row groups that don't pass predicate filtering
seek to a given row
4 block-mode compression based on data type
run-length encoding for integer columns
dictionary encoding for string columns
5 concurrent reads of the same file using separate RecordReaders
6 ability to split files without scanning for markers
7 bound the amount of memory needed for reading or writing
8 metadata stored using Protocol Buffers, which allows addition and removal of fields

Now spark sql support Parquet, support ORC provide people more opts.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

orc.diff
16/Oct/14 02:47
41 kB
Zhan Zhang

Issue Links

duplicates

SPARK-2883 Spark Support for ORCFile format

Resolved

links to

[Github] Pull Request #2576 (scwf)

Activity

People

Assignee:: Unassigned

Reporter:: Fei Wang

Votes:: 1 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 29/Sep/14 13:57

Updated:: 02/Apr/15 18:44

Resolved:: 17/Nov/14 23:19