Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-1892

Using mesos-0.20.1.jar with libmesos-0.21.0 reliably segfaults

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Cannot Reproduce
    • 0.21.0
    • 0.21.0
    • java api
    • None
    • Observed on Ubuntu 14.04.1 LTS (trusty) using both OpenJDK6 and OpenJDK7.

    Description

      The segfault seems to occur on every call to driver.launchTasks()

      Here is a minimal(ish) Java framework for testing purposes:
      https://github.com/ConnorDoyle/mesos-compat

      Here is some output from running the framework against a Mesos 0.20.1 cluster with libmesos-0.20.1:

      mesosMaster: [127.0.1.1:5050]
      I1009 18:22:28.044186 20570 sched.cpp:139] Version: 0.20.1
      I1009 18:22:28.045871 20564 sched.cpp:235] New master detected at master@127.0.1.1:5050
      I1009 18:22:28.048756 20564 sched.cpp:243] No credentials provided. Attempting to register without authentication
      I1009 18:22:28.050904 20564 sched.cpp:409] Framework registered with 20141009-161859-16842879-5050-13592-0006
      Registered with the Mesos master:
      id: "20141009-161859-16842879-5050-13592"
      ip: 16842879
      port: 5050
      pid: "master@127.0.1.1:5050"
      hostname: "mesos.vm"
      
      Assigned framework ID:
      value: "20141009-161859-16842879-5050-13592-0006"
      
      Received resource offers:
      id {
        value: "20141009-161859-16842879-5050-13592-9"
      }
      framework_id {
        value: "20141009-161859-16842879-5050-13592-0006"
      }
      slave_id {
        value: "20141006-070645-16842879-5050-1540-0"
      }
      hostname: "10.141.141.10"
      resources {
        name: "cpus"
        type: SCALAR
        scalar {
          value: 1.0
        }
        role: "*"
      }
      resources {
        name: "mem"
        type: SCALAR
        scalar {
          value: 1001.0
        }
        role: "*"
      }
      resources {
        name: "disk"
        type: SCALAR
        scalar {
          value: 34068.0
        }
        role: "*"
      }
      resources {
        name: "ports"
        type: RANGES
        ranges {
          range {
            begin: 31000
            end: 32000
          }
        }
        role: "*"
      }
      
      Building task list...
      Calling SchedulerDriver.launchTasks...
      Received a status update:
      task_id {
        value: "mesos-compat-0"
      }
      state: TASK_RUNNING
      slave_id {
        value: "20141006-070645-16842879-5050-1540-0"
      }
      timestamp: 1.412878948485535E9
      
      Received a status update:
      task_id {
        value: "mesos-compat-0"
      }
      state: TASK_FINISHED
      message: "Command exited with status 0"
      slave_id {
        value: "20141006-070645-16842879-5050-1540-0"
      }
      timestamp: 1.412878949486471E9
      

      Here is some output from running the framework against a Mesos 0.21.0-SNAPSHOT cluster with libmesos-0.21.0-SNAPSHOT:

      mesosMaster: [172.31.38.219:5050]
      I1009 18:34:01.391448 21087 sched.cpp:137] Version: 0.21.0
      I1009 18:34:01.394507 21081 sched.cpp:233] New master detected at master@172.31.38.219:5050
      I1009 18:34:01.394839 21081 sched.cpp:241] No credentials provided. Attempting to register without authentication
      I1009 18:34:01.403170 21084 sched.cpp:407] Framework registered with 20141008-182754-3676708780-5050-27608-0002
      Registered with the Mesos master:
      id: "20141008-182754-3676708780-5050-27608"
      ip: 3676708780
      port: 5050
      pid: "master@172.31.38.219:5050"
      hostname: "ip-172-31-38-219.eu-west-1.compute.internal"
      
      Assigned framework ID:
      value: "20141008-182754-3676708780-5050-27608-0002"
      
      Received resource offers:
      id {
        value: "20141008-182754-3676708780-5050-27608-49558"
      }
      framework_id {
        value: "20141008-182754-3676708780-5050-27608-0002"
      }
      slave_id {
        value: "20141008-054505-3676708780-5050-21219-3"
      }
      hostname: "ip-172-31-45-108.eu-west-1.compute.internal"
      resources {
        name: "cpus"
        type: SCALAR
        scalar {
          value: 2.0
        }
        role: "*"
      }
      resources {
        name: "mem"
        type: SCALAR
        scalar {
          value: 6489.0
        }
        role: "*"
      }
      resources {
        name: "disk"
        type: SCALAR
        scalar {
          value: 5023.0
        }
        role: "*"
      }
      resources {
        name: "ports"
        type: RANGES
        ranges {
          range {
            begin: 31000
            end: 32000
          }
        }
        role: "*"
      }
      
      Building task list...
      Calling SchedulerDriver.launchTasks...
      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      #  SIGSEGV (0xb) at pc=0x00007fa34b14923d, pid=20973, tid=140339180459776
      #
      # JRE version: OpenJDK Runtime Environment (7.0_65-b32) (build 1.7.0_65-b32)
      # Java VM: OpenJDK 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
      # Derivative: IcedTea 2.5.2
      # Distribution: Ubuntu 14.04 LTS, package 7u65-2.5.2-3~14.04
      # Problematic frame:
      # C  [libmesos-0.21.0.so+0xbac23d]  JNIEnv_::CallBooleanMethod(_jobject*, _jmethodID*, ...)+0x7d
      #
      # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
      #
      # An error report file with more information is saved as:
      # /home/ubuntu/mesos-compat/hs_err_pid20973.log
      #
      # If you would like to submit a bug report, please include
      # instructions on how to reproduce the bug and visit:
      #   http://icedtea.classpath.org/bugzilla
      # The crash happened outside the Java Virtual Machine in native code.
      # See problematic frame for where to report the bug.
      #
      ^CAborted (core dumped)
      

      Attachments

        1. hs_err_pid29964.log
          33 kB
          Connor Doyle
        2. gdb-session.txt
          20 kB
          Connor Doyle

        Activity

          People

            tnachen Timothy Chen
            cdoyle Connor Doyle
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: