Uploaded image for project: 'Apache Lens (Retired)'
  1. Apache Lens (Retired)
  2. LENS-444

cube.fact.is.aggregated not properly documented

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: api, cube
    • Labels:
      None

      Description

      Consider a measure in a cube:

          <measure name="revenue" type="DOUBLE" default_aggr="SUM"/>
      

      Consider that a fact table F is supplying data to this cube, which has the column "revenue"

      We run a query:

      lens-shell>query execute cube select userid, count(revenue) from user_activity where time_range_in(dt, '2014-06-25-00', '2014-06-26-00')
      Launching query failed cause:No driver accepted the query, because No candidate fact table available to answer the query, because {"brief":"Columns: [[hive_fact_user_curation_good_traffic]] are missing default aggregate","details":{"user_attributestore_er_fact_adgroup_view,user_attributestore_er_fact_supply_site_burn,user_attributestore_er_fact_demandcategory_click,user_attributestore_er_fact_supplycategory_visits,user_attributestore_er_fact_supply_site_impressions_rendered,user_attributestore_er_fact_adgroup_click,user_attributestore_er_fact_adgroup_impression_time_install,user_attributestore_er_fact_app_impression_time_install,user_attributestore_er_fact_supply_site_impressions_served,user_attributestore_er_fact_adgroup_burn,user_attributestore_er_fact_app_visits,user_attributestore_er_fact_app_click,user_attributestore_er_fact_supply_site_click,user_attributestore_er_fact_adgroup_impressions_rendered":[{"cause":"COLUMN_NOT_FOUND","missingColumns":["totalburn"]}],"hive_fact_user_curation_good_traffic":[{"cause":"MISSING_DEFAULT_AGGREGATE","columnsMissingDefaultAggregate":["hive_fact_user_curation_good_traffic"]}]}}
      

      Lens complains the that the "columnsMissingDefaultAggregate". This happens because we are querying for "count" when the default_aggr defined for the measure in the cube is SUM. It runs fine if the query is for sum(revenue).

      This is then fixed by setting the property "cube.fact.is.aggregated" = false on the fact table F.

      IMO this behaviour of "is aggregated fact" is not documented properly and will leave many other users confused. Lets make it more obvious by way of having it as part of fact schema spec or document it well.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              angadsingh Angad Singh
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: