Hive
  1. Hive
  2. HIVE-165

Add standard statistical functions

    Details

    • Type: Wish Wish
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      The last step in the unholy triumvirate of statistical built-ins is the variance. We already have the n (count) and the mean (avg). I currently have a job or two that filters all of the data into a single reducer which just computes mean/n/variance and writes it to a table...so my guess is that this would be a pretty big speed increase. Not a huge deal though, as computing the variance myself is trivial.

      (Average, variance, and n can be co-computed in one pass, so if you're doing var() you can basically have avg() and count() for free.)

        Issue Links

          Activity

          Gavin made changes -
          Link This issue depends upon HIVE-194 [ HIVE-194 ]
          Gavin made changes -
          Link This issue depends on HIVE-194 [ HIVE-194 ]
          Carl Steinbach made changes -
          Resolution Duplicate [ 3 ]
          Status Reopened [ 4 ] Resolved [ 5 ]
          Carl Steinbach made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Carl Steinbach made changes -
          Link This issue duplicates HIVE-607 [ HIVE-607 ]
          Adam Kramer made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          David Phillips made changes -
          Link This issue depends on HIVE-194 [ HIVE-194 ]
          David Phillips made changes -
          Summary var(col) built-in to go with avg(col) and count(col) Add standard statistical functions
          Jeff Hammerbacher made changes -
          Component/s Query Processor [ 12312586 ]
          David Phillips made changes -
          Assignee David Phillips [ electrum ]
          Adam Kramer made changes -
          Field Original Value New Value
          Description The last step in the unholy triumvirate of statistical built-ins is the variance...we already have the n (count) and the mean (avg). I currently have one reduce step that just computes mean/n/variance and writes it to a table, so my guess is that this would be a pretty big speed increase. Not a huge deal though, as computing the variance myself is trivial. (Average, variance, and n can be co-computed in one pass) The last step in the unholy triumvirate of statistical built-ins is the variance. We already have the n (count) and the mean (avg). I currently have a job or two that filters all of the data into a single reducer which just computes mean/n/variance and writes it to a table...so my guess is that this would be a pretty big speed increase. Not a huge deal though, as computing the variance myself is trivial.

          (Average, variance, and n can be co-computed in one pass, so if you're doing var() you can basically have avg() and count() for free.)
          Adam Kramer created issue -

            People

            • Assignee:
              David Phillips
              Reporter:
              Adam Kramer
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development