Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2081

Prevent automatic indexing from creating worse queries

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: Indexing
    • Labels:
      None

      Description

      We want to make sure that automatically using indexes doesn't make the query worse. For example, after scanning the index table, it might still need to scan the whole base table. In this case, we would much rather just kill the index job and go back and scan the whole base table.

      This can be done by adding a conditional task and a backup task. You can detect whether the index is good or not by monitoring the index job's number of input records and number of output records, and comparing them. As an initial example, if the ratio is >50, do not use the index, and go back to scanning the whole base table.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                rmelick Russell Melick
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated: