Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9788

Weird things happen when impalad restarts with different hostname but same IP

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • Impala 3.4.0
    • None
    • Backend
    • None
    • ghx-label-12

    Description

      I was messing around with running impala in a single-node dockerized configuration and ran into a bunch of weirdness stemming when I restarted the impalad. It got into a state where where was a new and old statestore registration with the same IP/port and different hostnames (since docker generates new hostnames for each incarnation of the container).

      I saw a crash in Coordinator::GetRootSink(). The cause of that is the coordinator treating the same impalad as two distinct backends, and sending two execute RPCs to the backend (this is a single node cluster).

      I0528 17:32:41.760128   573 coordinator.cc:143] f84b158b036445ad:3a9defdf00000000] Exec() query_id=f84b158b036445ad:3a9defdf00000000 stmt=SELECT COUNT(*) FROM tpcds_kudu.call_center
      I0528 17:32:41.760670   573 coordinator.cc:463] f84b158b036445ad:3a9defdf00000000] starting execution on 2 backends for query_id=f84b158b036445ad:3a9defdf00000000
      ..
      I0528 17:32:41.762449    78 control-service.cc:153] f84b158b036445ad:3a9defdf00000000] ExecQueryFInstances(): query_id=f84b158b036445ad:3a9defdf00000000 coord=a16ac03fc53b:22000 #instances=1
      I0528 17:32:41.761706    79 control-service.cc:153] f84b158b036445ad:3a9defdf00000000] ExecQueryFInstances(): query_id=f84b158b036445ad:3a9defdf00000000 coord=a16ac03fc53b:22000 #instances=4
      ..
      Wrote minidump to /opt/impala/logs/minidumps/impalad/15727084-c931-49e1-62d37e86-75cfe0f6.dmp
      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      #  SIGSEGV (0xb) at pc=0x00000000011a0d50, pid=1, tid=0x00007f92b5e8c700
      #
      # JRE version: OpenJDK Runtime Environment (8.0_242-b08) (build 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08)
      # Java VM: OpenJDK 64-Bit Server VM (25.242-b08 mixed mode linux-amd64 compressed oops)
      # Problematic frame:
      Wrote minidump to /opt/impala/logs/minidumps/impalad/15727084-c931-49e1-62d37e86-75cfe0f6.dmp
      # C  [impalad+0xda0d50]  impala::FragmentInstanceState::GetRootSink() const+0x0
      #
      # Core dump written. Default location: /opt/impala/core or core.1
      #
      # An error report file with more information is saved as:
      # /opt/impala/hs_err_pid1.log
      #
      # If you would like to submit a bug report, please visit:
      #   http://bugreport.java.com/bugreport/crash.jsp
      #
      
      

      CC Thomas Marshall

      At a separate time I saw it trip the "Tried to add existing backend to executor group" case in ExecutorGroup::AddExecutor().

      >>void ExecutorGroup::AddExecutor(const BackendDescriptorPB& be_desc) {
          // be_desc.is_executor can be false for the local backend when scheduling queries to run
          // on the coordinator host.
          DCHECK(!be_desc.ip_address().empty());
          Executors& be_descs = executor_map_[be_desc.ip_address()];
          auto eq = [&be_desc](const BackendDescriptorPB& existing) {
            // The IP addresses must already match, so it is sufficient to check the port.
            DCHECK_EQ(existing.ip_address(), be_desc.ip_address());
            return existing.address().port() == be_desc.address().port();
          };
          if (find_if(be_descs.begin(), be_descs.end(), eq) != be_descs.end()) {
            LOG(DFATAL) << "Tried to add existing backend to executor group: "
                        << be_desc.krpc_address();
            return;
          }
          if (!CheckConsistencyOrWarn(be_desc)) {
            LOG(WARNING) << "Ignoring inconsistent backend for executor group: "
                         << be_desc.krpc_address();
            return;
          }
          if (be_descs.empty()) {
            executor_ip_hash_ring_.AddNode(be_desc.ip_address());
          }
          be_descs.push_back(be_desc);
          executor_ip_map_[be_desc.address().hostname()] = be_desc.ip_address();
        }
      

      I'm not sure if using the hostname to identify impalads is even useful at this point, we could probably simplify this by using IP address only.

      Attachments

        1. get-root-sink-resolved.txt
          709 kB
          Tim Armstrong
        2. Screenshot from 2020-05-28 10-53-16.png
          207 kB
          Tim Armstrong
        3. statestore.log
          1.48 MB
          Tim Armstrong

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            stakiar Sahil Takiar
            tarmstrong Tim Armstrong

            Dates

              Created:
              Updated:

              Slack

                Issue deployment