Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-1804

the "store" component cause on-top framework (chronos) crash

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None
    • mesos-0.19.0

    Description

      chronos running with mesos-0.19.0 may crash like below.

      [2014-09-05 15:21:36,095] INFO State J_chronos_job_34 does not exist yet. Adding to state (com.airbnb.scheduler.state.MesosStatePersistenceStore:146)
      F0905 15:21:36.175230 27727 org_apache_mesos_state_AbstractState.cpp:319] Check failed: future->isReady()
      *** Check failure stack trace: ***
      @ 0x7f4f1ecb199d google::LogMessage::Fail()
      @ 0x7f4f1ecb59b7 google::LogMessage::SendToLog()
      @ 0x7f4f1ecb3839 google::LogMessage::Flush()
      @ 0x7f4f1ecb3b3d google::LogMessageFatal::~LogMessageFatal()
      @ 0x7f4f1ec2ef90 Java_org_apache_mesos_state_AbstractState__1_1store_1get
      @ 0x7f4f18293d45 (unknown)
      Aborted (core dumped)
      

      The related code snippet as below:

      $ sed -ne '311,334p' src/java/jni/org_apache_mesos_state_AbstractState.cpp
      JNIEXPORT jobject JNICALL Java_org_apache_mesos_state_AbstractState__1_1store_1get
        (JNIEnv* env, jobject thiz, jlong jfuture)
      {
        Future<Option<Variable> >* future = (Future<Option<Variable> >*) jfuture;
      
        future->await();
      
        if (future->isFailed()) {
          jclass clazz = env->FindClass("java/util/concurrent/ExecutionException");
          env->ThrowNew(clazz, future->failure().c_str());
          return NULL;
        } else if (future->isDiscarded()) {
          // TODO(benh): Consider throwing an ExecutionException since we
          // never return true for 'isCancelled'.
          jclass clazz = env->FindClass("java/util/concurrent/CancellationException");
          env->ThrowNew(clazz, "Future was discarded");
          return NULL;
        }
      
        CHECK_READY(*future);
      
        if (future->get().isSome()) {
          Variable* variable = new Variable(future->get().get());
      

      The root cause seems that CHECK_READY(*future) failed and crashed chronos.

      See chronos issue: https://github.com/airbnb/chronos/issues/253

      Attachments

        Issue Links

          Activity

            People

              chengwei-yang Chengwei Yang
              chengwei-yang Chengwei Yang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: