Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3720

support ORC in spark sql

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.1.0
    • None
    • SQL
    • None

    Description

      The Optimized Row Columnar (ORC) file format provides a highly efficient way to store data on hdfs.ORC file format has many advantages such as:

      1 a single file as the output of each task, which reduces the NameNode's load
      2 Hive type support including datetime, decimal, and the complex types (struct, list, map, and union)
      3 light-weight indexes stored within the file
      skip row groups that don't pass predicate filtering
      seek to a given row
      4 block-mode compression based on data type
      run-length encoding for integer columns
      dictionary encoding for string columns
      5 concurrent reads of the same file using separate RecordReaders
      6 ability to split files without scanning for markers
      7 bound the amount of memory needed for reading or writing
      8 metadata stored using Protocol Buffers, which allows addition and removal of fields

      Now spark sql support Parquet, support ORC provide people more opts.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            scwf Fei Wang
            Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment