Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6724

Dump operator context to logs when error occurs during query execution

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.14.0
    • 1.15.0
    • None

    Description

      • Sometimes when an error occurs it may be helpful to have operator stack with context data along with original stacktrace to help developers to track down a failure. To achieve this, each operator should provide a context which can be dumped to logs. In the end of FragmentExecutor's execution if the state is FAILED the operator stack will be dumped to logs before exception stacktrace.

       

      Dump example for case DRILL-6670 (Operator dump started and Operator dump completed designate start and end of dump respectively):

      
      ...
      
      2018-09-17 08:26:34,392 [24609ac4-80f6-b318-7296-ee53c406a9f6:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 24609ac4-80f6-b318-7296-ee53c406a9f6:0:0: State change requested RUNNING --> FAILED
      2018-09-17 08:26:34,392 [24609ac4-80f6-b318-7296-ee53c406a9f6:frag:0:0] ERROR  o.a.d.e.physical.impl.BaseRootExec - Operator dump started.
      2018-09-17 08:26:34,392 [24609ac4-80f6-b318-7296-ee53c406a9f6:frag:0:0] ERROR  o.a.d.e.p.i.p.ProjectRecordBatch - ProjectRecordBatch[projector=null, hasRemainder=false, remainderIndex=0, recordCount=0]
      2018-09-17 08:26:34,392 [24609ac4-80f6-b318-7296-ee53c406a9f6:frag:0:0] ERROR  o.a.d.exec.physical.impl.ScanBatch - ScanBatch[container=org.apache.drill.exec.record.VectorContainer@383d6f45[recordCount = 0, schemaChanged = true, schema = null, wrappers = [], ...], currentReader=ParquetRecordReader[File=/home/mapr/example.parquet, Row group index=0, Records in row group=1, Total records read=-1, MetadataParquetMetaData{FileMetaData{schema: message schema {
        optional binary name (UTF8);
        optional binary creation_parameters (UTF8);
        optional int64 creation_date (TIMESTAMP_MICROS);
        optional int32 data_version;
        optional int32 schema_version;
      }
      , metadata: \{pandas={"index_columns": [], "column_indexes": [], "columns": [{"name": "name", "field_name": "name", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, \{"name": "creation_parameters", "field_name": "creation_parameters", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, \{"name": "creation_date", "field_name": "creation_date", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, \{"name": "data_version", "field_name": "data_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}, \{"name": "schema_version", "field_name": "schema_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}], "pandas_version": "0.21.0"}}}, blocks: [BlockMetaData\{1, 331 [ColumnMetaData{SNAPPY [name] optional binary name (UTF8)  [PLAIN, PLAIN_DICTIONARY, RLE], 27}, ColumnMetaData\{SNAPPY [creation_parameters] optional binary creation_parameters (UTF8)  [PLAIN, PLAIN_DICTIONARY, RLE], 140}, ColumnMetaData\{SNAPPY [creation_date] optional int64 creation_date (TIMESTAMP_MICROS)  [PLAIN, PLAIN_DICTIONARY, RLE], 267}, ColumnMetaData\{SNAPPY [data_version] optional int32 data_version  [PLAIN, PLAIN_DICTIONARY, RLE], 430}, ColumnMetaData\{SNAPPY [schema_version] optional int32 schema_version  [PLAIN, PLAIN_DICTIONARY, RLE], 567}]}]}], schema=null]
      2018-09-17 08:26:34,392 [24609ac4-80f6-b318-7296-ee53c406a9f6:frag:0:0] ERROR  o.a.d.e.physical.impl.BaseRootExec - Operator dump completed.
      2018-09-17 08:26:34,392 [24609ac4-80f6-b318-7296-ee53c406a9f6:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 24609ac4-80f6-b318-7296-ee53c406a9f6:0:0: State change requested FAILED --> FINISHED
      
      ...
      
      

      Another example:

      13:57:27.603 [245b3168-3d98-1046-5f2b-6b4935b9012d:frag:0:0] ERROR o.a.d.e.physical.impl.BaseRootExec - Operator dump started: dumping last 2 failed operators
      13:57:27.604 [245b3168-3d98-1046-5f2b-6b4935b9012d:frag:0:0] ERROR o.a.d.e.p.i.v.IteratorValidatorBatchIterator - IteratorValidatorBatchIterator[container=org.apache.drill.exec.record.VectorContainer@1ab00dc8[recordCount = 0, schemaChanged = false, schema = BatchSchema [fields=[[`dir0` (VARCHAR:OPTIONAL)], [`dir1` (VARCHAR:OPTIONAL)], [`o_orderkey` (INT:REQUIRED)], [`o_custkey` (INT:REQUIRED)], [`o_orderstatus` (VARCHAR:REQUIRED)], [`o_totalprice` (FLOAT8:REQUIRED)], [`o_orderdate` (DATE:REQUIRED)], [`o_orderpriority` (VARCHAR:REQUIRED)], [`o_clerk` (VARCHAR:REQUIRED)], [`o_shippriority` (INT:REQUIRED)], [`o_comment` (VARCHAR:REQUIRED)]], selectionVector=TWO_BYTE], wrappers = [org.apache.drill.exec.vector.NullableVarCharVector@6e1bb101[field = [`dir0` (VARCHAR:OPTIONAL)], ...], org.apache.drill.exec.vector.NullableVarCharVector@1edc5831[field = [`dir1` (VARCHAR:OPTIONAL)], ...], org.apache.drill.exec.vector.IntVector@2c63d90c[field = [`o_orderkey` (INT:REQUIRED)], ...], org.apache.drill.exec.vector.IntVector@36275f05[field = [`o_custkey` (INT:REQUIRED)], ...], org.apache.drill.exec.vector.VarCharVector@5077379[field = [`o_orderstatus` (VARCHAR:REQUIRED)], ...], org.apache.drill.exec.vector.Float8Vector@79e6ae38[field = [`o_totalprice` (FLOAT8:REQUIRED)], ...], org.apache.drill.exec.vector.DateVector@4bffb3b1[field = [`o_orderdate` (DATE:REQUIRED)], ...], org.apache.drill.exec.vector.VarCharVector@472c02bc[field = [`o_orderpriority` (VARCHAR:REQUIRED)], ...], org.apache.drill.exec.vector.VarCharVector@790e3d9b[field = [`o_clerk` (VARCHAR:REQUIRED)], ...], org.apache.drill.exec.vector.IntVector@35d5cbec[field = [`o_shippriority` (INT:REQUIRED)], ...], org.apache.drill.exec.vector.VarCharVector@38194dc[field = [`o_comment` (VARCHAR:REQUIRED)], ...]], ...], instNum=17, batchTypeName=LimitRecordBatch, lastSchema=null, lastNewSchema=null]
      13:57:27.604 [245b3168-3d98-1046-5f2b-6b4935b9012d:frag:0:0] ERROR o.a.d.e.p.i.limit.LimitRecordBatch - LimitRecordBatch[container=org.apache.drill.exec.record.VectorContainer@1ab00dc8[recordCount = 0, schemaChanged = false, schema = BatchSchema [fields=[[`dir0` (VARCHAR:OPTIONAL)], [`dir1` (VARCHAR:OPTIONAL)], [`o_orderkey` (INT:REQUIRED)], [`o_custkey` (INT:REQUIRED)], [`o_orderstatus` (VARCHAR:REQUIRED)], [`o_totalprice` (FLOAT8:REQUIRED)], [`o_orderdate` (DATE:REQUIRED)], [`o_orderpriority` (VARCHAR:REQUIRED)], [`o_clerk` (VARCHAR:REQUIRED)], [`o_shippriority` (INT:REQUIRED)], [`o_comment` (VARCHAR:REQUIRED)]], selectionVector=TWO_BYTE], wrappers = [org.apache.drill.exec.vector.NullableVarCharVector@6e1bb101[field = [`dir0` (VARCHAR:OPTIONAL)], ...], org.apache.drill.exec.vector.NullableVarCharVector@1edc5831[field = [`dir1` (VARCHAR:OPTIONAL)], ...], org.apache.drill.exec.vector.IntVector@2c63d90c[field = [`o_orderkey` (INT:REQUIRED)], ...], org.apache.drill.exec.vector.IntVector@36275f05[field = [`o_custkey` (INT:REQUIRED)], ...], org.apache.drill.exec.vector.VarCharVector@5077379[field = [`o_orderstatus` (VARCHAR:REQUIRED)], ...], org.apache.drill.exec.vector.Float8Vector@79e6ae38[field = [`o_totalprice` (FLOAT8:REQUIRED)], ...], org.apache.drill.exec.vector.DateVector@4bffb3b1[field = [`o_orderdate` (DATE:REQUIRED)], ...], org.apache.drill.exec.vector.VarCharVector@472c02bc[field = [`o_orderpriority` (VARCHAR:REQUIRED)], ...], org.apache.drill.exec.vector.VarCharVector@790e3d9b[field = [`o_clerk` (VARCHAR:REQUIRED)], ...], org.apache.drill.exec.vector.IntVector@35d5cbec[field = [`o_shippriority` (INT:REQUIRED)], ...], org.apache.drill.exec.vector.VarCharVector@38194dc[field = [`o_comment` (VARCHAR:REQUIRED)], ...]], ...], offset=0, numberOfRecords=5, incomingSV=null, outgoingSV=[SV2: recs=0 - ]]
      13:57:27.604 [245b3168-3d98-1046-5f2b-6b4935b9012d:frag:0:0] ERROR o.a.d.e.physical.impl.BaseRootExec - Operator dump completed.
      

      For UserException with ErrorType SYSTEM or INTERNAL_ERROR some exceptions contain not useful message so added a line suggesting refering to logs for more information for these error types:

      org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: java.lang.IndexOutOfBoundsException: DrillBuf[7], udle: [1 0..0], index: 0, length: 4 (expected: range(0, 0)) DrillBuf[7], udle: [1 0..0]
      
      Fragment 0:0
      
      Please, refer to logs for more information.
      
      [Error Id: 995bd741-eb39-4eaf-86d5-d3089f1041f0 on node4.cluster.com:31010]
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bohdan Bohdan Kazydub
            bohdan Bohdan Kazydub
            Vitalii Diravka Vitalii Diravka
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment