Uploaded image for project: 'Apache HAWQ'
  1. Apache HAWQ
  2. HAWQ-979

Resource Broker Should Reconnect Hadoop Yarn When Failed to Get Cluster Report

    XMLWordPrintableJSON

Details

    Description

      While HAWQ with yarn mode is running, sometimes the heartbeat thread of libyarn maybe fail(e.g. YARN RM restarts) and quit,

      2016-08-03 18:45:27.913838 PDT,,,p34645,th-1290610400,,,,0,con4,,seg-10000,,,,,"WARNING","01000","YARN mode resource broker failed to get YARN queue report of queue default. LibYarnClient::getQueueInfo, Catch the Exception:LibYarnClient::libyarn AM heartbeat thread has stopped.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1840,

      resource broker process should re-register HAWQ to YARN in this case, but actually not.

      The reason is:
      In function handleRM2RB_GetClusterReport(), when RB2YARN_getQueueReport() failed, function sendRBGetClusterReportErrorData() is called, but sendRBGetClusterReportErrorData() returns OK(should return RESBROK_ERROR_GRM)

      Attachments

        Activity

          People

            wlin Wen Lin
            wlin Wen Lin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: