Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4847

Cuboid to HFile step failed on multiple job server env because of trying to read the metric jar file from the inactive job server's location.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • v3.1.0
    • v3.1.2
    • Job Engine
    • None

    Description

      My Cluster Setting

      1. versIon: 3.1.0
      2. 2 job servers(job & query mode), 2 query only servers. Each of them runs on each different host machine.
      3. Use spark engine to build job.

      Problem Circumstance

      Root cause

      The active job server submits spark job to execute `Convert Cuboid Data to HFile`. But the active job server get an error because a resource for submitting spark job has the wrong path which the active job server cannot read.

      • wrong resource: ${KYLIN_HOME}/tomcat/webapps/kylin/WEB-INF/lib/metrics-core-2.2.0.jar
      • The ${KYLIN_HOME} is the inactive job server's location for only the above jar file.

      This situation occurs in the following two circumstances.

      On build cube

      1. Request the build API to the inactive job server. (exactly: /kylin/api/cubes/${cube_name}/rebuild )
      2. Inactive job server stores the build task in meta store.
      3. Active job server takes the build task and proceeds it.
      4. Active job server failed on the `Convert Cuboid Data to HFile` step.

      *This doesn't occur when I request build API to the active job server.*

      On merge

      1. Trigger merge cube job periodically
      2. Active job server takes the merge task and proceeds it.
      3. Active job server failed on the `Convert Cuboid Data to HFile` step.

      *This doesn't occur when there is only one job server in the cluster.*

      Progress to solve this.

      I'm trying to find which code set the metrics-core-2.2.0.jar path wrong.
      Until now, I guess this code would be the set the metrics-core-2.2.0.jar for the `Cuboid to HFile` spark job.

      Questions

      1. I'm trying to remote debug with IDE to make sure my guess is right. But the breakpoint on that line is not captured on Runtime. It seems to be called on the booting phase. Is it right?

      2. Is there any hint or guessing to solve this issue regardless of the above my progress?

      Attachments

        Activity

          People

            yoonsung.lee yoonsung.lee
            yoonsung.lee yoonsung.lee
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: