Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23776

Retire quickstats autocollection

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      this is about:

      • num files
      • datasize (sum of filesizes)
      • num erasure coded files

      right now these are scanned during every BasicStatsTask execution - which means some filesystem reads/etc - for small inserts these are visible in case the fs is a bit slower (s3 and friends)

      I don't think they are really in use...we rely more on columnstats which are more accurate ; and because of the datasize in this case is for "offline" (ondisk) - while we should be insted calculate with "online" sizes...

      proposal:

      • remove collection and storage of this data
      • collect it on the fly during "desc formatted" statements to provide them for informational purposes

      Attachments

        Issue Links

          Activity

            People

              kgyrtkirk Zoltan Haindrich
              kgyrtkirk Zoltan Haindrich
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: