Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6157

Fetching column stats slower than the 101 during rush hour

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.13.0
    • 0.13.0
    • None
    • None

    Description

      "hive.stats.fetch.column.stats" controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats...

      The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable.

      Attachments

        1. HIVE-6157.03.patch
          892 kB
          Sergey Shelukhin
        2. HIVE-6157.03.patch
          892 kB
          Sergey Shelukhin
        3. HIVE-6157.nogen.patch
          162 kB
          Sergey Shelukhin
        4. HIVE-6157.01.patch
          886 kB
          Sergey Shelukhin
        5. HIVE-6157.01.patch
          886 kB
          Sergey Shelukhin
        6. HIVE-6157.nogen.patch
          156 kB
          Sergey Shelukhin
        7. HIVE-6157.prelim.patch
          887 kB
          Sergey Shelukhin

        Issue Links

          Activity

            People

              sershe Sergey Shelukhin
              hagleitn Gunther Hagleitner
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: