Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2848

HMS DB UUID fetching requires HMS to be running when Kudu Master starts

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.10.0
    • Fix Version/s: 1.10.0
    • Component/s: hms, master
    • Labels:
      None

      Description

      The fix for KUDU-2841 has a nasty unintended side effect: the HMS must be running when a Kudu Master is started. Otherwise the master fails with an error:

      E0611 21:14:53.749974 14595 master.cc:205] Unable to init master catalog manager: Network error: Unable to initialize catalog manager: failed to start Hive Metastore catalog: failed to open Hive Metastore connection: socket open() error: Connection refused
      F0611 21:14:53.750124 14341 master_main.cc:107] Check failed: _s.ok() Bad status: Network error: Unable to initialize catalog manager: failed to start Hive Metastore catalog: failed to open Hive Metastore connection: socket open() error: Connection refused
      *** Check failure stack trace: ***
      Wrote minidump to /var/log/kudu/minidumps/kudu-master/ca91ad8c-723d-42ed-47a35b98-6c028db3.dmp
      *** Aborted at 1560312893 (unix time) try "date -d @1560312893" if you are using GNU date ***
      PC: @       0x33a2e324f5 __GI_raise
      *** SIGABRT (@0x1e200003805) received by PID 14341 (TID 0x7ff6454c00c0) from PID 14341; stack trace: ***
          @       0x33a320f7e0 (unknown)
          @       0x33a2e324f5 __GI_raise
          @       0x33a2e33cd5 __GI_abort
          @          0x255d8a9 kudu::AbortFailureFunction()
          @           0xb5721d google::LogMessage::Fail()
          @           0xb590dd google::LogMessage::SendToLog()
          @           0xb56d59 google::LogMessage::Flush()
          @           0xb59b7f google::LogMessageFatal::~LogMessageFatal()
          @           0xaafbc4 kudu::master::MasterMain()
          @       0x33a2e1ed20 __libc_start_main
          @           0xaaf6a1 (unknown)
      

      We need to find a workaround otherwise overall cluster resilience is degraded.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                adar Adar Dembo
                Reporter:
                adar Adar Dembo
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: