Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17486

Enable SharedWorkOptimizer in tez on HOS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      in HIVE-16602, Implement shared scans with Tez.

      Given a query plan, the goal is to identify scans on input tables that can be merged so the data is read only once. Optimization will be carried out at the physical level. In Hive on Spark, it caches the result of spark work if the spark work is used by more than 1 child spark work. After sharedWorkOptimizer is enabled in physical plan in HoS, the identical table scans are merged to 1 table scan. This result of table scan will be used by more 1 child spark work. Thus we need not do the same computation because of cache mechanism.

      Attachments

        1. explain.28.share.false
          17 kB
          liyunzhang
        2. explain.28.share.true
          44 kB
          liyunzhang
        3. HIVE-17486.1.patch
          54 kB
          liyunzhang
        4. HIVE-17486.2.patch
          79 kB
          liyunzhang
        5. HIVE-17486.3.patch
          77 kB
          liyunzhang
        6. HIVE-17486.4.patch
          77 kB
          liyunzhang
        7. HIVE-17486.5.patch
          79 kB
          liyunzhang
        8. scanshare.after.svg
          98 kB
          liyunzhang
        9. scanshare.before.svg
          107 kB
          liyunzhang

        Issue Links

          Activity

            People

              kellyzly liyunzhang
              kellyzly liyunzhang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: