Uploaded image for project: 'Griffin'
  1. Griffin
  2. GRIFFIN-266

[Service] Measure's rules are not always properly sorted

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.5.0
    • 0.6.0
    • None
    • None

    Description

      If measure has more than one rule, which is common practice for dsl.type spark-sql, it could happen that measure's rules are not sorted correctly which results in job failing.

      Example:

      GET measure by id returns rules sorted in this order: 3005, 3006 and then 3004 (it should be 3004, 3005, 3006)

      {
              "id": 3005,
              "rule": "SELECT count(*) as incomplete FROM source WHERE (node_metrics_pk IS NULL) OR (node_master_fk IS NULL) OR (location_id IS NULL) OR (freq_band IS NULL) OR (ts IS NULL) ",
              "dsl.type": "spark-sql",
              "dq.type": null,
              "out.dataframe.name": "incomplete_count",
              "out": [
                  \{
                      "type": "record",
                      "name": "incomplete_count"
                  },
                  \{
                      "type": "metric",
                      "name": "incomplete_count"
                  }
              ]
          },
          \{
              "id": 3006,
              "rule": "SELECT (total - incomplete) AS complete FROM total_count LEFT JOIN incomplete_count",
              "dsl.type": "spark-sql",
              "dq.type": null,
              "out.dataframe.name": "complete_count",
              "out": [
                  {
                      "type": "metric",
                      "name": "complete_count"
                  }
              ]
          },
          \{
              "id": 3004,
              "rule": "SELECT COUNT(*) AS total FROM source",
              "dsl.type": "spark-sql",
              "dq.type": null,
              "out.dataframe.name": "total_count",
              "out": [
                  {
                      "type": "record",
                      "name": "total_count"
                  },
                  \{
                      "type": "metric",
                      "name": "total_count"
                  }
              ]
          }
      

       

      Griffin job fails with error:

      19/07/11 11:00:31 ERROR transform.SparkSqlTransformStep: run spark sql [ SELECT (total - incomplete) AS complete FROM total_count LEFT JOIN incomplete_count ] error: Table or view not found: total_count; line 1 pos 45
      
      org.apache.spark.sql.AnalysisException: Table or view not found: total_count
      

      As we see execution of rule 3005 fails because rule 3004 is not executed yet (due to incorrect sorting).

      Measure's entity EvaluateRule.java does not have a sorting:
      https://github.com/apache/griffin/blob/master/service/src/main/java/org/apache/griffin/core/measure/entity/EvaluateRule.java#L32-L38

      According to Postgresql documentation: https://www.postgresql.org/docs/9.3/sql-select.html
      If the ORDER BY clause is specified, the returned rows are sorted in the specified order.
      If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce.

      Proposed solution here is to set sorting in EvaluateRule.java.
       

      Attachments

        Issue Links

          Activity

            People

              Kevin Yao Kevin Yao
              neveljkovic Nevena Veljkovic
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m