Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9549

Impalad startup fails to wait for catalogd to startup when using local catalog

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 4.0.0
    • Impala 4.0.0
    • Backend
    • None

    Description

      Since Impala coordinators and executors may be starting up at the same time as the catalogd, they should be tolerant of delays in the catalogd starting up. When using local catalog (use_local_catalog=true), the Impalads fail with the following error if the catalogd startup is delayed:

      I0323 14:22:03.151849 29565 jni-util.cc:288] org.apache.impala.catalog.local.LocalCatalogException: Unable to load database names
      I0323 14:22:03.151849 29565 jni-util.cc:288] org.apache.impala.catalog.local.LocalCatalogException: Unable to load database names
       at org.apache.impala.catalog.local.LocalCatalog.loadDbs(LocalCatalog.java:94)
       at org.apache.impala.catalog.local.LocalCatalog.getDbs(LocalCatalog.java:83)
       at org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:753)
       at org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:220)
      Caused by: org.apache.thrift.TException: org.apache.impala.common.InternalException: Couldn't open transport for localhost:26000 (connect() failed: Connection refused)
      
       at org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:382)
       at org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:174)
       at org.apache.impala.catalog.local.CatalogdMetaProvider$1.call(CatalogdMetaProvider.java:583)
       at org.apache.impala.catalog.local.CatalogdMetaProvider$1.call(CatalogdMetaProvider.java:578)
       at org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:509)
       at org.apache.impala.catalog.local.CatalogdMetaProvider.loadDbList(CatalogdMetaProvider.java:577)
       at org.apache.impala.catalog.local.LocalCatalog.loadDbs(LocalCatalog.java:92)
       ... 3 more
      Caused by: org.apache.impala.common.InternalException: Couldn't open transport for localhost:26000 (connect() failed: Connection refused)
       at org.apache.impala.service.FeSupport.NativeGetPartialCatalogObject(Native Method)
       at org.apache.impala.service.FeSupport.GetPartialCatalogObject(FeSupport.java:440)
       at org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:380)
       ... 9 more
      I0323 14:22:03.217051 29565 status.cc:126] LocalCatalogException: Unable to load database names
      CAUSED BY: TException: org.apache.impala.common.InternalException: Couldn't open transport for localhost:26000 (connect() failed: Connection refused)

      What happens is that the ImpalaServer constructor calls ImpalaServer::UpdateCatalogMetrics() (https://github.com/apache/impala/blob/3b833902519fb8f0ef9b5fd20919c5fd85d22fcf/be/src/service/impala-server.cc#L452 ). UpdateCatalogMetrics() is maintaining two metrics that track the number of databases and the number of tables. This ends up calling org.apache.impala.catalog.local.LocalCatalog.getDbs(), which calls loadDbs() (https://github.com/apache/impala/blob/ca0785ec206f27f06d8d6fd1b710779e548bbd8e/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java#L83 ). loadDbs() requires a connection to catalogd and will fail if it cannot connect.

      Importantly, this all happens before waiting for the catalogd to start up in the regular ImpalaServer::Start():

      if (FLAGS_is_coordinator) exec_env_->frontend()->WaitForCatalog();
      

       

      In the old catalog implementation (use_local_catalog=false), the getDbs() call on the catalog returns whatever values it has, and it does not try to contact the catalogd. This is why the regular case does not see this problem.

      Attachments

        Activity

          People

            joemcdonnell Joe McDonnell
            joemcdonnell Joe McDonnell
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: