Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9788

Weird things happen when impalad restarts with different hostname but same IP

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • Impala 3.4.0
    • None
    • Backend
    • None
    • ghx-label-12

    Description

      I was messing around with running impala in a single-node dockerized configuration and ran into a bunch of weirdness stemming when I restarted the impalad. It got into a state where where was a new and old statestore registration with the same IP/port and different hostnames (since docker generates new hostnames for each incarnation of the container).

      I saw a crash in Coordinator::GetRootSink(). The cause of that is the coordinator treating the same impalad as two distinct backends, and sending two execute RPCs to the backend (this is a single node cluster).

      I0528 17:32:41.760128   573 coordinator.cc:143] f84b158b036445ad:3a9defdf00000000] Exec() query_id=f84b158b036445ad:3a9defdf00000000 stmt=SELECT COUNT(*) FROM tpcds_kudu.call_center
      I0528 17:32:41.760670   573 coordinator.cc:463] f84b158b036445ad:3a9defdf00000000] starting execution on 2 backends for query_id=f84b158b036445ad:3a9defdf00000000
      ..
      I0528 17:32:41.762449    78 control-service.cc:153] f84b158b036445ad:3a9defdf00000000] ExecQueryFInstances(): query_id=f84b158b036445ad:3a9defdf00000000 coord=a16ac03fc53b:22000 #instances=1
      I0528 17:32:41.761706    79 control-service.cc:153] f84b158b036445ad:3a9defdf00000000] ExecQueryFInstances(): query_id=f84b158b036445ad:3a9defdf00000000 coord=a16ac03fc53b:22000 #instances=4
      ..
      Wrote minidump to /opt/impala/logs/minidumps/impalad/15727084-c931-49e1-62d37e86-75cfe0f6.dmp
      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      #  SIGSEGV (0xb) at pc=0x00000000011a0d50, pid=1, tid=0x00007f92b5e8c700
      #
      # JRE version: OpenJDK Runtime Environment (8.0_242-b08) (build 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08)
      # Java VM: OpenJDK 64-Bit Server VM (25.242-b08 mixed mode linux-amd64 compressed oops)
      # Problematic frame:
      Wrote minidump to /opt/impala/logs/minidumps/impalad/15727084-c931-49e1-62d37e86-75cfe0f6.dmp
      # C  [impalad+0xda0d50]  impala::FragmentInstanceState::GetRootSink() const+0x0
      #
      # Core dump written. Default location: /opt/impala/core or core.1
      #
      # An error report file with more information is saved as:
      # /opt/impala/hs_err_pid1.log
      #
      # If you would like to submit a bug report, please visit:
      #   http://bugreport.java.com/bugreport/crash.jsp
      #
      
      

      CC twm378

      At a separate time I saw it trip the "Tried to add existing backend to executor group" case in ExecutorGroup::AddExecutor().

      >>void ExecutorGroup::AddExecutor(const BackendDescriptorPB& be_desc) {
          // be_desc.is_executor can be false for the local backend when scheduling queries to run
          // on the coordinator host.
          DCHECK(!be_desc.ip_address().empty());
          Executors& be_descs = executor_map_[be_desc.ip_address()];
          auto eq = [&be_desc](const BackendDescriptorPB& existing) {
            // The IP addresses must already match, so it is sufficient to check the port.
            DCHECK_EQ(existing.ip_address(), be_desc.ip_address());
            return existing.address().port() == be_desc.address().port();
          };
          if (find_if(be_descs.begin(), be_descs.end(), eq) != be_descs.end()) {
            LOG(DFATAL) << "Tried to add existing backend to executor group: "
                        << be_desc.krpc_address();
            return;
          }
          if (!CheckConsistencyOrWarn(be_desc)) {
            LOG(WARNING) << "Ignoring inconsistent backend for executor group: "
                         << be_desc.krpc_address();
            return;
          }
          if (be_descs.empty()) {
            executor_ip_hash_ring_.AddNode(be_desc.ip_address());
          }
          be_descs.push_back(be_desc);
          executor_ip_map_[be_desc.address().hostname()] = be_desc.ip_address();
        }
      

      I'm not sure if using the hostname to identify impalads is even useful at this point, we could probably simplify this by using IP address only.

      Attachments

        1. get-root-sink-resolved.txt
          709 kB
          Tim Armstrong
        2. Screenshot from 2020-05-28 10-53-16.png
          207 kB
          Tim Armstrong
        3. statestore.log
          1.48 MB
          Tim Armstrong

        Issue Links

          Activity

            People

              stakiar Sahil Takiar
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: