Uploaded image for project: 'Apache Airflow'
  1. Apache Airflow
  2. AIRFLOW-7026

Improve SparkSqlHook's error message

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.10.9
    • Fix Version/s: None
    • Component/s: hooks
    • Labels:
      None

      Description

      If SparkSqlHook.run_query() fails, it raises the following exception.

              if returncode:
                  raise AirflowException(
                      "Cannot execute {} on {}. Process exit code: {}.".format(
                          cmd, self._conn.host, returncode
                      )
                  )
      

      But this message is not so useful actually. For example:

      In [1]: from airflow.providers.apache.spark.operators.spark_sql import SparkSqlOperator                                                                                      
      
      In [2]: SparkSqlOperator(sql="SELECT * FROM NON_EXISTENT_TABLE", master="local[*]", conn_id="spark_default", task_id="_").execute(None)                                      
      
      (snip)
      
      ---------------------------------------------------------------------------
      AirflowException                          Traceback (most recent call last)
      <ipython-input-2-d69c4454e999> in <module>
      ----> 1 SparkSqlOperator(sql="SELECT * FROM NON_EXISTENT_TABLE", master="local[*]", conn_id="spark_default", task_id="_").execute(None)
      
      ~/repos/incubator-airflow/airflow/providers/apache/spark/operators/spark_sql.py in execute(self, context)
          105                                   yarn_queue=self._yarn_queue
          106                                   )
      --> 107         self._hook.run_query()
          108 
          109     def on_kill(self):
      
      ~/repos/incubator-airflow/airflow/providers/apache/spark/hooks/spark_sql.py in run_query(self, cmd, **kwargs)
          154             raise AirflowException(
          155                 "Cannot execute {} on {}. Process exit code: {}.".format(
      --> 156                     cmd, self._conn.host, returncode
          157                 )
          158             )
      
      AirflowException: Cannot execute  on yarn. Process exit code: 1.
      

      Most users will expect the executed query is shown as the first argument for the exception and the "master" value (i.e., "local[*]" here) as the second, but meaningless information (an empty string and "yarn") is shown instead.
      The reason are as follows:

      • The executed query is specified by the "sql" parameter for the SparkSqlHook.__init__ method, not by cmd.
      • The "master" value is specified by the "master" parameter for the SparkSqlHook.__init__ method, not by self._conn.host. Actually, self._conn is not used at all in SparkSqlHook.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sekikn Kengo Seki
                Reporter:
                sekikn Kengo Seki
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: