Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8219

Multi-Insert optimization, don't sink the source into a file [Spark Branch]

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • Spark

    Description

      Current implementation split the operator plan at the lowest common ancester by inserting one FileSinkOperator and a list of TableScanOperators. Writing to a file (by the FS) is expensive. We should be able to insert a ReduceSinkOperator instead. The result RDD from the first job can be cached and refereed in subsequent Spark jobs.

      This is a followup for HIVE-7503.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              xuefuz Xuefu Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: