Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20382

Materialized views: Introduce heuristic to favour incremental rebuild

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0-alpha-1
    • Materialized views
    • None

    Description

      Currently, we do not expose stats over ROW__ID.writeId to the optimizer (this should be fixed by HIVE-20313). Even if we did, we always assume uniform distribution of the column values, which can easily lead to overestimations on the number of rows read when we filter on ROW__ID.writeId for materialized views (think about a large transaction for MV creation and then small ones for incremental maintenance). This overestimation can lead to incremental view maintenance not being triggered as cost of the incremental plan is overestimated (we think we will read more rows than we actually do). This could be fixed by introducing histograms that reflect better the column values distribution.

      Till both fixes are implemented, we will use a config variable that will multiply the estimated cost of the rebuild plan and hence will be able to favour incremental rebuild over full rebuild.

      Attachments

        1. HIVE-20382.01.patch
          40 kB
          jcamachorodriguez
        2. HIVE-20382.02.patch
          47 kB
          jcamachorodriguez
        3. HIVE-20382.02.patch
          47 kB
          jcamachorodriguez
        4. HIVE-20382.patch
          40 kB
          jcamachorodriguez
        5. HIVE-20382.patch
          40 kB
          jcamachorodriguez

        Issue Links

          Activity

            People

              jcamacho Jesús Camacho Rodríguez
              jcamacho Jesús Camacho Rodríguez
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: