Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-165

Add standard statistical functions

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • None
    • None
    • Query Processor
    • None

    Description

      The last step in the unholy triumvirate of statistical built-ins is the variance. We already have the n (count) and the mean (avg). I currently have a job or two that filters all of the data into a single reducer which just computes mean/n/variance and writes it to a table...so my guess is that this would be a pretty big speed increase. Not a huge deal though, as computing the variance myself is trivial.

      (Average, variance, and n can be co-computed in one pass, so if you're doing var() you can basically have avg() and count() for free.)

      Attachments

        Issue Links

          Activity

            People

              electrum David Phillips
              akramer Adam Kramer
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: