Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.1.0
    • Component/s: debian, rpm
    • Labels:
      None

      Description

      Let's upgrade packaging separately

      1. BIGTOP-2104.patch
        16 kB
        Konstantin Boudnik

        Issue Links

          Activity

          Hide
          cos Konstantin Boudnik added a comment -

          Here's the patch based on this commit from original Jonathan Kelly PR

          I have rebased this on master and added the following code into do-component-build via dynamic patching (thanks Olaf Flebbe ! Without it, we'll be building Spark against god knows which version of underlaying hadoop. Clearly, we need to offer this fix to the upstream.

          sed -i -e 's#<hadoop.version>2.6.0</hadoop.version>#<hadoop.version>${hadoop.version}</hadoop.version>#' pom.xml
          

          to avoid building Spark against wrong version of the

          Show
          cos Konstantin Boudnik added a comment - Here's the patch based on this commit from original Jonathan Kelly PR I have rebased this on master and added the following code into do-component-build via dynamic patching (thanks Olaf Flebbe ! Without it, we'll be building Spark against god knows which version of underlaying hadoop. Clearly, we need to offer this fix to the upstream. sed -i -e 's#<hadoop.version>2.6.0</hadoop.version>#<hadoop.version>${hadoop.version}</hadoop.version>#' pom.xml to avoid building Spark against wrong version of the
          Hide
          cos Konstantin Boudnik added a comment -

          Patch is ready

          Show
          cos Konstantin Boudnik added a comment - Patch is ready
          Hide
          cos Konstantin Boudnik added a comment -

          Now, after some testing with latest Puppet code for Spark, I see this when a worker is attempting to connect to the master

          15/10/31 00:58:29 INFO Worker: Retrying connection to master (attempt # 1)
          15/10/31 00:58:29 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[sparkWorker-akka.actor.default-dispatcher-3,5,main]
          java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@70012519 rejected from java.util.concurrent.ThreadPoolExecutor@5b2ac5c1[Running, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1]
                  at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
                  at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
          

          Does anyone see the same issue?

          Show
          cos Konstantin Boudnik added a comment - Now, after some testing with latest Puppet code for Spark, I see this when a worker is attempting to connect to the master 15/10/31 00:58:29 INFO Worker: Retrying connection to master (attempt # 1) 15/10/31 00:58:29 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[sparkWorker-akka.actor.default-dispatcher-3,5,main] java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@70012519 rejected from java.util.concurrent.ThreadPoolExecutor@5b2ac5c1[Running, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) Does anyone see the same issue?
          Hide
          cos Konstantin Boudnik added a comment -

          Jonathan Kelly, I see that spark-default.xml configuration file says

          spark.master yarn

          , although I am running standalone Spark and, even more, I don't even have YARN installed on the cluster. Not sure if this matters, but perhaps you can shed some light on it? Thanks!

          Show
          cos Konstantin Boudnik added a comment - Jonathan Kelly , I see that spark-default.xml configuration file says spark.master yarn , although I am running standalone Spark and, even more, I don't even have YARN installed on the cluster. Not sure if this matters, but perhaps you can shed some light on it? Thanks!
          Hide
          cos Konstantin Boudnik added a comment - - edited

          Ok, apparently spark.master yarn will only affect submitted applications and it needs to be fixed for non-yarn deployments like standalone, etc. Perhaps it might be done as a separate JIRA or as an improvement for the Puppet recipes update.

          In the meanwhile, I have figured out what was wrong with the startup (although I don't know why). For some reason, master is binding to the container's IP address (e.g. 172.17.0.10) despite that SPARK_MASTER_IP is set to the container's hostname. Work however is trying to connect the master via the hostname and the whole shenanigan fails ;(

          It doesn't seem like a packaging issue, but more like a screwup on the Spark side. Yet, we still need to figure out why this is happening. I anyone has ideas - please chime in!

          I tend to commit this patch tomorrow afternoon unless I hear otherwise from the author or someone else.

          Show
          cos Konstantin Boudnik added a comment - - edited Ok, apparently spark.master yarn will only affect submitted applications and it needs to be fixed for non-yarn deployments like standalone, etc. Perhaps it might be done as a separate JIRA or as an improvement for the Puppet recipes update. In the meanwhile, I have figured out what was wrong with the startup (although I don't know why). For some reason, master is binding to the container's IP address (e.g. 172.17.0.10) despite that SPARK_MASTER_IP is set to the container's hostname. Work however is trying to connect the master via the hostname and the whole shenanigan fails ;( It doesn't seem like a packaging issue, but more like a screwup on the Spark side. Yet, we still need to figure out why this is happening. I anyone has ideas - please chime in! I tend to commit this patch tomorrow afternoon unless I hear otherwise from the author or someone else.
          Hide
          cos Konstantin Boudnik added a comment -

          Reading some more into Spark docs, this popped to my attention:

          SPARK_MASTER_IP 	Bind the master to a specific IP address, for example a public one.
          

          Perhaps there's a reason is says "IP address", not the host name?

          Show
          cos Konstantin Boudnik added a comment - Reading some more into Spark docs, this popped to my attention: SPARK_MASTER_IP Bind the master to a specific IP address, for example a public one. Perhaps there's a reason is says "IP address", not the host name?
          Hide
          cos Konstantin Boudnik added a comment -

          While it still looks like a bug in the Spark to me I can confirm that adding --host hostname to the Master startup argument solves the problem. It looks like damn Master is somehow ignoring the content of the spark-env.sh when it is sources or something along these lines. I tend to blame load-spark-env.sh script, but I might be wrong. At any rate - I am committing this and will open a ticket to fix the startup issue.

          Show
          cos Konstantin Boudnik added a comment - While it still looks like a bug in the Spark to me I can confirm that adding --host hostname to the Master startup argument solves the problem. It looks like damn Master is somehow ignoring the content of the spark-env.sh when it is sources or something along these lines. I tend to blame load-spark-env.sh script, but I might be wrong. At any rate - I am committing this and will open a ticket to fix the startup issue.
          Hide
          cos Konstantin Boudnik added a comment -

          The patch has been pushed to the master. Thanks Jonathan!

          Show
          cos Konstantin Boudnik added a comment - The patch has been pushed to the master. Thanks Jonathan!

            People

            • Assignee:
              jonathak Jonathan Kelly
              Reporter:
              cos Konstantin Boudnik
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development