Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5090

Improve the logging of causes for "unknown disk id" including possible workarounds

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0
    • Fix Version/s: None
    • Component/s: Catalog
    • Epic Color:
      ghx-label-9

      Description

      A frequent cause of "unknown disk id" warnings during query execution is that at the time of table loading one of the DNs holding relevant data was overloaded and could not give a timely response to dfs.getFileBlockStorageLocations() calls from the CatalogServer.

      You will find messages similar to this in the catalogd logs at the time of table loading:

      I0315 07:30:49.752166 33660 BlockStorageLocationUtil.java:167] Cancelled while waiting for datanode 10.17.184.31:50020: java.util.concurrent.CancellationException
      I0315 07:30:49.752351 33660 BlockStorageLocationUtil.java:167] Cancelled while waiting for datanode 10.17.184.32:50020: java.util.concurrent.CancellationException
      I0315 07:30:49.752465 33660 BlockStorageLocationUtil.java:167] Cancelled while waiting for datanode 10.17.182.22:50020: java.util.concurrent.CancellationException
      

      Also look for "Unknown disk id count for filesystem" in the catalogd logs to see how many missing disk ids were found in total.

      This JIRA is for improving the error reporting dumped to the catalogd log when disk ids fail to load due to DN issues. In particular, the values for the following DN configuration options are often set pretty aggressively.

      • dfs.datanode.handler.count
      • dfs.client.file-block-storage-locations.timeout.millis
        The logging should include the current setting of these configs and mention that increasing the might mitigate the disk id issues on a busy cluster.

      In addition, we should consider enhancing the BE "unknown disk id" warning to include possible causes (heavy load on HDFS) and to recommend examining the catalogd logs for more information.

      Note that this improvement is only relevant to Impala versions prior to IMPALA-4172 because after that change we no longer contact the DNs for disk ids.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: