Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14070

Use ORC data source for SQL queries on ORC tables

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.6.1
    • Fix Version/s: 2.0.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      Currently if one is trying to query ORC tables in Hive, the plan generated by Spark hows that its using the `HiveTableScan` operator which is generic to all file formats. We could instead use the ORC data source for this so that we can get ORC specific optimizations like predicate pushdown.

      Current behaviour:

      ```
      scala> hqlContext.sql("SELECT * FROM orc_table").explain(true)
      == Parsed Logical Plan ==
      'Project [unresolvedalias(*, None)]
      +- 'UnresolvedRelation `orc_table`, None

      == Analyzed Logical Plan ==
      key: string, value: string
      Project key#171,value#172
      +- MetastoreRelation default, orc_table, None

      == Optimized Logical Plan ==
      MetastoreRelation default, orc_table, None

      == Physical Plan ==
      HiveTableScan key#171,value#172, MetastoreRelation default, orc_table, None
      ```

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              tejasp Tejas Patil
              Reporter:
              tejasp Tejas Patil
              Shepherd:
              Michael Armbrust

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment