Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10261

Data size can be underestimated when computed with partial column stats

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      With hive.stats.fetch.column.stats=true, we'll estimate data size with column stats when annotating operators with statistics. However, when column stats is partial, we're likely to underestimate data size, which may hurt performance, e.g. picking an inappropriate small table for map join.

      Attachments

        Activity

          People

            Unassigned Unassigned
            lirui Rui Li
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: